Ever Wondered that Voice is from a Human Being or Created by a Computer? Revisiting Normalizing Flows

Posted December 31, 2021 by Gowri Shankar ‐ 8 min read

We often consider everything is normal when we see the creations of nature. This is because almost everything looks similar to the instances of the entity it inherits from. For example, a dog looks like its fellow doggies so as the trees, birds, and human beings. However, on closer observation and further inspection, we find the uniqueness of each instance - That is the true beauty of the creation. When the instances vastly differ from their counterparts we tend to call them differently-abled or specially powered, if not super-powered. i.e, Nature is a huge probabilistic model with a repository of diverse distributions and we are all samples of a particular distribution which is not a normal distribution that makes us who we are. Hence, nothing is normal and everything is special. What these distributions are, why they are special and how we can create them - the concept of normalizing flows gives us some light.

We have studied normalizing flows in detail in the past where we built an element-wise flow algorithm with detailed intuition of the math behind it. In this post, we shall revisit the same with a few interesting ideas and concepts from a vastly different context. Please refer to the older normalizing flows post here,

Normalizing Flows - A Practical Guide Using Tensorflow Probability

Multiverse

Image Credit: This Is Why The Multiverse Must Exist

Disclaimer: Opinions are my own and do not reflect thoughts and ideas of anyone else.

Objective

The objective of this post is to understand normalizing flows in an un-mathy way. I am not sure whether that is possible, but I am going to give an attempt.

Introduction

In machine learning applications we have two variables that we often consider the key elements for representing intelligence, 1. The past observations and 2. The future outcomes. In other words, observed variables ($x$) and the target variable ($y$) that bring in a certain level of determinism if all the confounders are considered. However in reality, or as per the laws of nature - things are quite different - there is no chance for deterministic outcomes, hence our machine learning models often go haywire. This state of uncertainty results in distrust over the models that we build. There is no big difference between the uncertainty of the machine learning models and the human behaviors - both lack overall confidence due to the absence of statistically significant confounders to make intelligent outcomes. Note the two variables,

Observed variables($x$)
Target Variables($y$)

We do not have infinite energy(to store and compute) or time to identify all confounding agents for an objective but we have the option to quantify the uncertainty caused due to their absence of them. i.e. From the statistical point of view, we move from striving for deterministic outcomes to probabilistic inferences with a confidence measure. There comes the $3^{rd}$ variable,

Latent variables($z$ or unobserved variables)

Every Outcome of Nature is Unique

Before getting deep into the creation process, I would like to emphasize a well-known but often forgotten concept of “Universal Connection”. i.e we are all connected at the macro as well as micro-level. It is not very difficult for one to accept the similarity at the sub-atomic level between a man and a macaque. The building block of both men and the macaques are the electrons, protons, neutrons, etc wired within the quantum energy field with complex entanglement among all. On contrary, at the macro level, we see differences of minute variance within species and at larger variance across species. Let us say $d$ denotes the distribution, then

$$|d(Man) - d(Macaque)| \sim |d(Cat) - d(Cougar)|$$ $$and$$ $$|d(Man) - d(Macaque)| \ll |d(Man) - d(Cat)| \ll |d(Cat) - d(Tree)|$$

To create any distribution of the above kind, we need two things

An input that describes the parameters of the distribution and
A process to create

I guess I will not draw much debate that everything evolved from nothing since the big bang at $t_0$. For simplicity let us make 3 assumptions,

The creation starts from a single cell organism that more or less has similar properties to the great mammoths and mighty humans - like eating, resting, reproducing, etc.
Time is an illusion, let the evolution be represented in terms of space distributed rather than time distributed - i.e We have $N$ universes, where $N$ is the number of time steps between $t_{now}$ and $t_{zero}$
All these universes are linearly connected or juxtaposed to form a long string-like structure - Manifolds.

With those assumptions in place, It is not very difficult for us to visualize the connectedness of everything with every other thing. There must have been a process and manifolds for a cat to become a cougar similar to an amoeba to become an oligarch.

Let us say the distribution of the single-cell organism is standard normal $\mathcal{N}(\mu=0, \sigma=1)$, then the distribution of the mighty human must be unimaginably complex and there should be a process flow to reach that stage - That special sequence of processes are mentioned in the string-like structure we have coined in our assumption.



A Normalizing Flow is a transformation of a simple 
probability distribution(e.g. a standard normal) into 
a more complex distribution by a sequence of 
invertible and differentiable mappings.

The density of a sample can be evaluated by transforming 
it back to the original simple distribution.

- Kobyzev et al, Normalizing Flows: An Intro and 
Review of Current Methods

The mean and the variance are the latent variables that determine the uniqueness of any given entity in the string-like structure. Using the latent variables we can create samples of the entities of interest.

Variational Inference

To accomplish any creation of unique properties, nature must be employing some sort of an efficient technique by having an idea about the future outcomes. We shall not forget the fact, nature’s goal is to achieve variational creations of the future. In the string-like structure that we have coined in the earlier section - every step(time) is eternal because we have transformed them into no time but space alone representation. We have access to the future outcomes for an observed variable and each observed variable has its latent variable counterpart. This future outcome is called posterior probability for a given condition.

Approximating the posterior probability is not an easy task, hence we introduce an auxiliary distribution that computes the likelihood with certain parameters. Learning these parameters is the core idea behind variational inferencing. At this point, we have two parameters, 1. Concerning the model and 2. The ones corresponding to variational approximation. Using them we reconstruct the new sample and compute the error(reconstruction error) of the approximations.

Neural networks are quite good at doing this kind of job. In some of the generative neural networks(create new things that never exist before e.g. VAEs, GANS, etc) input and outputs remain the same to learn the latent variables. Once the latent variables are learned, new samples that never existed before are created.

I consciously avoided a few critical terms like ${marginal, evidence, lower, bound}$, etc that might end up overloading the reader.

Normalizing Flows

A detailed explanation and the mathematical intuition behind normalizing flows can be found in the article mentioned in the opening note of this post. Proceed there to get a better understanding of this topic.

Normalizing flows are categorized under unsupervised learning and generative models. The key applications of them are precisely to tap the information hidden under a humongous amount of un-labeled data. Following are some of the most popular applications,

Density Estimation
Outlier Detection
Prior Construction
Dataset Summarization


For example, a density of a sample can be evaluated by transforming 
it back to the original simple distribution and the computing the 
product of the 
1. The density of the inverse-transformed sample  and 
2. The associated change in volume induced by the sequence 
of inverse transformations 

- Kobyzev et al, Normalizing Flows: An Intro and 
Review of Current Methods

Types of Flows

In this section, we shall brief various types of flows and their utility,

Elementwise Flows: A basic form of bijective non-linearity constructed using a bijective scalar function. Implementation of Normalizing Flow: Shift Function
- More about bijectors can be found in the below link
- Bijectors of Tensorflow Probability - A Guide to Understand the Motivation and Mathematical Intuition Behind Them
Linear Flows: When we want to express the correlation between dimensions linear flows are used.
Planar and Radial Flow:
- Planar flows expand and contract the distribution along a certain specific direction. Implementation of Normalizing Flow: Scale Function
- Radial flows modify the distribution around a specific point, around which the distribution is distorted. Implementation of Normalizing Flow: Rotate Function
Coupling and Autoregressing Flows: Coupling and AR are the most expressive and widely used flows.
- Coupling: Consider there is a disjoint partition of the input $x$, A coupling function along with a coupling flow, and a conditioner ensures the generation of a continuous distribution.
- AR Flows: AR flows are non-linear generalizations of multiplication by a triangular matrix. More details can be found in Masked AF(MAF, Papamakarios et al) and Inverse AF(IAF, Kingma et al)
Residual Flows: Residual networks have residual connections and blocks that consume a significant amount of memory. The application of Residual Flows is to save memory during training and stabilize computation.
Infinitesimal Flows: Learning happens as a continuous dynamical system rather than a discrete one like residual flows.

Epilogue

Have you ever wondered whether certain voices(e.g. Siri, Alexa, etc) are from a human being or generated by a machine? If yes, then most likely they are the outcomes of generative models like VAEs or GANs powered by normalizing flows. In this post, we explored normalizing flows from a highly opinionated and debatable context of evolutionary theory. It is my way of learning new ideas and concepts - I seek a metaphor to bring back comfort when I bump into something significantly complex and beyond my understanding. For normalizing flows, I took refuge in the theory of evolution. My goal was to explain my understanding without the support of any mathematical concepts and equations, though an equation replaces 1000s of words. I hope I did some justification for the challenge took upon.

2021 has been an extraordinary and fulfilling year for me, I lived a life that not many can even dream to attempt - What is next, I have no clue. Thanks to all those who supported me intellectually, emotionally, cognitively, virtually, and physically and a special thanks to my self for treating me with respect and empathy.

References

Variational Inference with Normalizing Flows by Rezende et al, 2016
Variational Inference with Normalizing Flows on MNIST by Mohammadreza Salehi, 2021
Normalizing Flows Models by Tony Duan, 2019
stupid-simple-norm-flow by Mohammadreza Salehi, 2021
Normalizing Flows for Probabilistic Modeling and Inference by Papamakarios et al, 2021
Normalizing Flows: An Introduction and Review of Current Methods by Kobyzev et al, Jul 2020
Normalizing Flows
pytorch-flows by Ilya Kostrikov, 2019
Masked Autoregressive Flow for Density Estimation by George Papamakarios, 2017
Variational Inference using Normalizing Flows (VINF
Normalizing Flows Are Not Magic by Pierre Segonne, 2020
Normalizing Flows by Adam Kosiorek, 2018

ever-wondered-that-voice-is-from-a-human-being-or-synthesizedcreated-by-a-computer-revisiting-normalizing-flows