Relooking Attention Models, SHA-RNN Overview, gMLP Briefing and Measuring Efficacy of Language Models through Perplexity

Posted September 11, 2021 by Gowri Shankar ‐ 10 min read

Satire and sarcasm are seldom seen in scientific writing but this is an era of memes and trolls where complex concepts are conveyed through highly comprehensible mediums(videos, animations, etc). When it comes to being critical(without hurting) about a concept or a character, sarcasm is taken as a medium in the literary renditions but seldom do we see them in the scholarly scriptures. Such sui generis is convivial and fervent for the patrons - Stephen Merity's 2019 paper titled Single Headed Attention RNN: Stop Thinking With Your Head(SHA-RNN) is one such scholarly writing where he is critical about today's (leading) approaches in language modeling especially our obsession towards the Attention Models without demonstrating outrage or distress. His paper is lucid and takes us back to celebrate the glory of yesteryear's multi-layer perceptrons. A more recent paper(Jun 2021) from Google titled Pay Attention to MLPs(gMLP) periphrastically confirms Stephen's claims with substantial empirical proof.
Gossips and Epicenter of Emotions for our Existence - Study on Anatomy of Limbic System

Posted September 5, 2021 by Gowri Shankar ‐ 9 min read

Cognitive and communicative systems of human brain did not evolve to build mathematical models or to find philosophical insights during our primitive times - They evolved so that we can gossip. Gossipping is the fundamental attribute of human beings that made us who we are. Gossips enabled men to create languages, cultures and civilizations, do you know for what? to impress our respective girlfriends(and boyfriends, obviously) and partners - makes it the epicenter for our existence. Why do I call Gossips the epicenter of our existence - Through gossips we conveyed our emotions and feelings that are seldom spoken openly, however spoken definitely that changed the course of our history - from Helen of Troy to Monica Lewinsky, we gossipped and gossipped until the reign brought down to its knees(pun intended). One single act called gossip designed the destiny of humanity by producing flavors of emotions in the human brain, specifically in the limbic region. That region has to be studied because our quest is to build human like intelligence on a silicon wafer and emotions are the prime factor that makes a human human.
Methodus Fluxionum et Serierum Infinitarum - Numerical Methods for Solving ODEs Using Our Favorite Tools

Posted August 28, 2021 by Gowri Shankar ‐ 8 min read

Wikipedia says, a differential equation is an equation that relates to one or more functions and their derivatives. In layman's term, the only constant in this life(universe) is change and any entity that is capable of adapting to change especially threats and adversarial ones thrived and flourished - Hence we are interested in studying the change and the rate at which the change occurs. Uff, that is too layman-ish definition for differential equations even for an unscholarly writer of my kind. Apparently, Newton called those functions fluxions, Gottfried Wilhelm Leibniz independently identified them are all history - they made differential equations a compelling topic for understanding the nature. Further, numerical analysis is a way to solve equations of algebraic order, they are quite the functions of convergence in the quest for achieving intelligence(artificial).
A Practical Guide to Univariate Time Series Models with Seasonality and Exogenous Inputs using Finance Data of FMCG Manufacturers

Posted August 21, 2021 by Gowri Shankar ‐ 10 min read

The definition of univariate time series is, a time series that consists of single scalar observations recorded sequentially over equal periodic intervals. i.e An array of numbers are recorded where time is an implicit dimension represented at constant periodicity. Univariate time series models(UTSM) are the simplest models that allow us to forecast the future values by learning the patterns in the sequence of observations recorded. The key elements of these patterns are Seasonality, Trends, Impact Points and Exogenous Variables. There are 3 schemes of pattern identification acts as building block for UTSMs, they are auto regression(OLS), moving averages and seasonality - When they augmented with external data, effectiveness of the model improves significantly.
Trend, Features of Structural Time Series, Mathematical Intuition Behind Trend Analysis for STS

Posted August 14, 2021 by Gowri Shankar ‐ 6 min read

Decomposability is the prime factor for the success of Generalized Additive Models in the quest for forecasting future events from the observed dataset. When we design a time series forecasting model, the functional features that we often observe from the data are trend, seasonality and impact points. The decomposable nature of these features from the dataset makes the problem conducive to model individual features independently by considering it as a curve fitting exercise. i.e. Modeling trend independently from other features makes the outcomes interpretable and subsequently paves way for advantages like bringing analyst in the loop in attaining convergence. This approach ignores the explicit dependence of temporal structure in the data that is a common function in generative models like ARIMA.
Fourier Series as a Function of Approximation for Seasonality Modeling - Exploring Facebook Prophet's Architecture

Posted August 8, 2021 by Gowri Shankar ‐ 8 min read

Generalized additive models are the most powerful structural time series models, forecast the horizons by identifying confounding characteristics in the data. Among those characteristics Seasonality is a common one observed in almost all time series data. Understanding and Identifying periodic(hourly, daily, monthly or something esoteric) occurrence of events and actions that impacts the outcome is an art that requires domain expertise. Fourier series a periodic function with its composition of harmonically related sinusoids that are combined by a weighted summations helps us in approximating an arbitrary function.
Structural Time Series Models, Why We Call It Structural? A Bayesian Scheme For Time Series Forecasting

Posted August 1, 2021 by Gowri Shankar ‐ 7 min read

Models we build are the machines that has fundamental capability to learn the underlying patterns in the observed data and store them in the form of weights. Patterns reside in different forms, shapes and sizes - this is ubiquitous because we are interpreting the universe through our observed data. When the observed data exhibit periodic patterns - we call them time series. The key challenge with time series data is the missing values and absence of confounders - makes them special. Further the problem gets more interesting when we approach time series forecasting in a Bayesian setup. This is a new series of posts I am starting with Structural Time Series(STS) where we explore a wide gamut of problems and approaches to declutter the underlying treasure.
Blind Source Separation using ICA - A Practical Guide to Separate Audio Signals

Posted July 24, 2021 by Gowri Shankar ‐ 6 min read

In this post we shall perform step by step implementation of blind source separation using independent component analysis. This is an end to end attempt to demonstrate a solution for cocktail party problem where we believe data observed from the nature is always a mixture of multiple distinct sources, identifying the source signal is critical for understanding the nature of the observed data. CAUTION: This page plays a music clip for 10 seconds while opening.
Cocktail Party Problem - Eigentheory and Blind Source Separation Using ICA

Posted July 18, 2021 by Gowri Shankar ‐ 13 min read

We will never achieve 100% accuracy on our predictability of real world events using any AI/ML algorithm and accuracy is a one simple metric that always lead to deception, Why? Data observed from the nature is always a mixture of multiple distinct sources, separating them by their origin is the basis for understanding. The process of separating the signals that consummate an observed data is called Blind Source Separation. Pondering, we human beings are creatures of grit and competence to come up with techniques like Independent Component Analysis(ICA) in the quest for understanding the complex entities of nature.
Courage and Data Literacy Required to Deploy an AI Model and Exploring Design Patterns for AI

Posted July 10, 2021 by Gowri Shankar ‐ 18 min read

Have you ever come across a situation where your dataset is closely linked with human beings and you are expected to optimize certain operations/processes. Does it made you feel anxious? You are not alone, operational optimizations at industrial/business processes are often focused towards minimizing human errors to maximize productivity/profitability - Most likely, depend on machines(to support) rather than fully rely on humans in decision making. These decisions might exacerbate the basic livelihood of certain section of people(often the ones in the bottom of the value chain) involved in the process, if AI is done wrongly.
Eigenvalue, Eigenvector, Eigenspace and Implementation of Google's PageRank Algorithm

Posted July 3, 2021 by Gowri Shankar ‐ 8 min read

Feature extraction techniques like Principal Component Analysis use eigenvalues and vectors for dimensionality reduction in a machine learning model by density estimation process through eigentheory. Eigenvalues depicts the variance of distribution of data in certain direction, the vector having the highest eigenvalue is the principal component of the feature set. In simple terms, eigenvalues helps us to find patterns inside a noisy data. By the way, Eigen is a German word and it means Particular or Proper - When it combined with value, it means - the proper value.
Need For Understanding Brain Functions, Introducing Medical Images - Brain, Heart and Hippocampus

Posted June 26, 2021 by Gowri Shankar ‐ 11 min read

Inspiration for an idea or an information often comes to the creator through divine influences, the great mathematician Srinivasa Ramanujan credits his family deity Namagiri for his mathematical genius. I believe the human brain structure and functions are the significant influencers for designing vision, speech and nlp systems of current kind. Understanding and in-silico reconstruction of neuronal circuits, behaviors and responses at the level of individual neurons and at the level of brain regions is critical for achieving superior intelligence.
Attribution and Counterfactuals - SHAP, LIME and DiCE

Posted June 19, 2021 by Gowri Shankar ‐ 10 min read

Why a machine learning model makes certain predictions/recommendations and what is the efficacy of those predicted outcomes wrt the real world is a deep topic of research. i.e What is the cause for a model to predict certain outcome. There are 2 popular methods our researchers had devised, Attribution based and Counterfactuals(CF) based schemes for model explanation. Attribution based methods provides scores for features and CFs generate examples from an alternate universe by tweaking few parameters of the input features.
Is Covid Crisis Lead to Prosperity - Causal Inference from a Counterfactual World Using Facebook Prophet

Posted June 12, 2021 by Gowri Shankar ‐ 13 min read

Identifying one causal reason is more powerful than identifying dozens of correlational patterns from the data, causal inferencing is a branch of statistics concern to effects that are consequence of actions. In traditional machine learning, we infer from the observations of the past asking how something had happened by characterizing the association between variables. On contrary, causal inferencing addresses why an event had happened through randomized experiments.
La Memoire, C'est Poser Son Attention Sur Le Temps

Posted June 5, 2021 by Gowri Shankar ‐ 10 min read

Powerful DNN architectures(MLPs, CNNs etc) fail to capture the temporal dependencies of the real world events. They are limitted to classifying the data by learning from the probability distribution of fixed length vectors(images). However, real world problems are function of time where the past events have significant impact on the current and future outcomes. Hence comes the simple but most powerful mechanism of attention and memory methods, inspired from the human cognitive system.
Normalizing Flows - A Practical Guide Using Tensorflow Probability

Posted May 29, 2021 by Gowri Shankar ‐ 9 min read

There are so many amazing blogs and papers on normalizing flows that lead to solving density estimation problems, this is yet another one. In this post, I am attempting to implement a flow based density transformation scheme that can be used for a generative model - We have a hands on coding session with supporting math. The most fascinating thing about flow based models are their ability to explicitly learn the data distribution through sequence of invertible transformations. Let us build a set of sophisticated transformations using Tensorflow Probability.
Why Covariance Matrix Should Be Positive Semi-Definite, Tests Using Breast Cancer Dataset

Posted May 23, 2021 by Gowri Shankar ‐ 8 min read

Are you keep hearing this phrase Covariance Matrix is Positive Semidefinite when you indulge in deep topics of machine learning and deep learning especially on the optimization front? Is it causing certain sense of uneasiness and makes you feel anxious about the need for your existence? You are not alone, In this post we shall see the properties of a Covariance Matrix. Also, we shall see the nature of eigen values for a covariance matrix.
Calculus - Gradient Descent Optimization through Jacobian Matrix for a Gaussian Distribution

Posted May 15, 2021 by Gowri Shankar ‐ 12 min read

Back to basics, in machine learning cost functions determines the error between the predicted outcomes and the observed values. Our goal is to minimize the loss i.e error over a single training sample calculated for the entire dataset iteratively to achieve convergence. It is like descending from a mountain by making optimal downward steps to reach the deepest point of the valley called global minima. In this post we shall optimize a non-linear function using calculus without any sophisticated libraries like tensorflow, pytorch etc.
With 20 Watts, We Built Cultures and Civilizations - Story of a Spiking Neuron

Posted May 9, 2021 by Gowri Shankar ‐ 13 min read

Our quest is to build human like AI systems that takes inspiration from the brain and imitate its memory, reasoning, feelings and learning capabilities within a controlled setup. It's 500 million years long story of evolution and optimization at cellular level. Today human brain consumes ~20W of power to run the show, with such an efficient machine humanity built cultures and civilizations. This evolutionary story shaping the development of deep learning systems inspiring us to think beyond the horizons of current comprehension.
Causal Reasoning, Trustworthy Models and Model Explainability using Saliency Maps

Posted May 2, 2021 by Gowri Shankar ‐ 9 min read

Correlation does not imply causation - In machine learning, especially deep neural networks(DNN) we are not evolved to confidently identify cause and their effects, learning agents learn from the probability distributions. In statistics, we accept and reject hypotheses to arrive at a tangible decisions, a similar kind of causal inferencing is key to the success of complex models to avoid false conclusions and consequences.
Higher Cognition through Inductive Bias, Out-of-Distribution and Biological Inspiration

Posted April 24, 2021 by Gowri Shankar ‐ 12 min read

The fascinating thing about human(animal) intelligence is its ability to systematically generalize things outside of the known distribution on which it is presumably trained. Instead of having huge list of hypothesis and heuristics, if intelligence can be explained with few principles - understanding intelligence and building intelligent machines will take an inspiring and evolutionary path.
Information Gain, Gini Index - Measuring and Reducing Uncertainty for Decision Trees

Posted April 17, 2021 by Gowri Shankar ‐ 9 min read

This is the 5th post on the series that declutters entropy - the measure of uncertainty. In this post, we shall explore 2 key concepts Information Gain and Gini Impurity which are used to measure and reduce uncertainty. We take Heart Disease dataset from UCI repository to understand information gain through decision trees
KL-Divergence, Relative Entropy in Deep Learning

Posted April 10, 2021 by Gowri Shankar ‐ 5 min read

This is the fourth post on Bayesian approach to ML models. Earlier we discussed uncertainty, entropy - measure of uncertainty, maximum likelihood estimation etc. In this post we are exploring KL-Divergence to calculate relative entropy between two distributions.
Shannon's Entropy, Measure of Uncertainty When Elections are Around

Posted April 3, 2021 by Gowri Shankar ‐ 6 min read

What is the most pressing issue in everyone's life, It is our inability to predict how things will turn out. i.e. Uncertainties, How awesome(or depressing) it will be if we make precise predictions and perform accurate computation to measure uncertainties.
Bayesian and Frequentist Approach to Machine Learning Models

Posted March 27, 2021 by Gowri Shankar ‐ 5 min read

Rev. Thomas Bayes discovered the theorem for conditional probability that bears his name and forms the basis for Bayesian Statistical methods. Sir Ronald Fisher is considered one of the founders of frequentist statistical methods and originally introduced maximum likelihood.
Understaning Uncertainty, Deterministic to Probabilistic Neural Networks

Posted March 19, 2021 by Gowri Shankar ‐ 8 min read

Uncertainty is a condition where there is limited or no knowledge about the existing state and impossibility to describe future outcome/outcomes. The essential nature of existence is driven by constant change that lead to quest for knowledge in the minds of the seeker.
Understanding Post-Synaptic Depression through Tsodyks-Markram Model by Solving Ordinary Differential Equation

Posted March 12, 2021 by Gowri Shankar ‐ 9 min read

Understanding the building blocks of the brain and its responsive nature is always a frontier for conquest and fascinating area of research. In this post, let us explore the temporal data acquired from the somatic recordings that explains short term synaptic plasticity strongly affects the neural dynamics of neocortical networks.
Automatic Differentiation Using Gradient Tapes

Posted December 14, 2020 by Gowri Shankar ‐ 9 min read

As a Data Scientist or Deep Learning Researcher, one must have a deeper knowledge in various differentiation techniques due to the fact that gradient based optimization techniques like Backpropagation algorithms are critical for model efficiency and convergence.
Roll your sleeves! Let us do some partial derivatives.

Posted August 14, 2020 by Gowri Shankar ‐ 3 min read

In this post, we shall explore a shallow neural network with a single hidden layer and the math behind back propagation algorithm, gradient descent
GradCAM, Model Interpretability - VGG16 & Xception Networks

Posted July 4, 2020 by Gowri Shankar ‐ 11 min read

The objective of this post is to understand the importance of Visual Explanations for CNN based large scale Deep Neural Network Models.
Tensorflow 2: Introduction, Feature Engineering and Metrics

Posted April 4, 2020 by Gowri Shankar ‐ 27 min read

Introducing TF2 through Train, Test, Valid splitting, Imputation, Bias/Overfit handlers, One Hot Encoding, Embeddings, Tensor Slices, Keras APIs, metrics including accuracy, precision and ROC curve
Time and Space Complexity - 5 Governing Rules

Posted February 28, 2020 by Gowri Shankar ‐ 9 min read

How to approach compute complexities, ie time and space complexity problems while designing a software system to avoid obvious bottlenecks in an abstract fashion.
ResNet50 vs InceptionV3 vs Xception vs NASNet - Introduction to Transfer Learning

Posted June 28, 2019 by Gowri Shankar ‐ 23 min read

Transfer learning is an ML methodology that enables to reuse a model developed for one task to another task. The applications are predominantly in Deep Learning for computer vision and natural language processing.