Evaluating Large Language Models Generated Contents with TruEra’s TruLens

Posted March 17, 2024 by Gowri Shankar  ‐  12 min read

It's been an eternity since I last endured Dr. Andrew Ng's sermon on evaluation strategies and metrics for scrutinizing the AI-generated content. Particularly, the cacophony about Large Language Models (LLMs), with special mentions of the illustrious OpenAI and Llama models scattered across the globe. How enlightening! It's quite a revelation, considering my acquaintances have relentlessly preached that Human Evaluation is the holy grail for GAI content. Of course, I've always been a skeptic, pondering the statistical insignificance lurking beneath the facade of human judgment. Naturally, I'm plagued with concerns about the looming specter of bias, the elusive trustworthiness of models, the Herculean task of constructing scalable GAI solutions, and the perpetual uncertainty regarding whether we're actually delivering anything of consequence. It's quite amusing how the luminaries and puppeteers orchestrating the GAI spectacle remain blissfully ignorant of the metrics that could potentially illuminate the quality of their creations. But let's not be too harsh; after all, we're merely at the nascent stages of transforming GAI content into a lucrative venture. The metrics and evaluation strategies are often relegated to the murky depths of technical debt, receiving the customary neglect from the business overlords.
The 40 rules of love

Posted December 23, 2023 by Gowri Shankar  ‐  41 min read

At the age of forty, Ella Rubenstein, stuck in an unhappy marriage, decides to work as a reader for a literary agent. Her first task is to read and review `Sweet Blasphemy,` a novel penned by Aziz Zahara. Ella is captivated by the story of Shams's quest for Rumi and the transformative impact the dervish has on the cleric, turning him into a devoted mystic, passionate poet, and love advocate. She is intrigued by Shams's teachings, which reveal an ancient philosophy centered on the unity of people and religions, as well as the presence of love within each individual. As Ella delves into the narrative, she discovers parallels between Rumi's journey and her own life, realizing that Zahara, like Shams, has arrived to liberate her.
Airflow Trigger Rules for Building Complex Data Pipelines Explained, and My Initial Days of Airflow Selection and Experience

Posted May 1, 2022 by Gowri Shankar  ‐  9 min read

Dell acquiring Boomi(circa 2010) was a big topic of discussion among my peers then, I was just start shifting my career from developing system software, device driver development to building distributed IT products at enterprise scale. I was so ignorant and questioned, 'why would someone pay so much for a piece of code that connects systems and schedules events'. I argued that those data pipeline processes can easily built in-house rather than depending on an external product. To understand the value of an integration platform or a workflow management system - one should strive for excellence in maintaining and serving reliable data at large scale. Building in-house data-pipelines, using Pentaho Kettle at enterprise scale to enjoying the flexibility of Apache Airflow is one of the most significant parts of my data journey.
Introduction to Contrastive Loss - Similarity Metric as an Objective Function

Posted January 30, 2022 by Gowri Shankar  ‐  6 min read

My first machine learning work was based on calculating the similarity between two arrays of dissimilar lengths. The array items represent features of handwritten characters extracted from a 2D vector captured using an electronic pen at a certain frequency, Circa 2001. The fundamental idea behind the similarity calculator is the measure of Euclidean distance between feature vectors of the corpus and the observed character strokes. Then came the most famous Siamese neural network architecture(~2005) that has two or more identical networks with the same parameters and weights that measure the similarity by comparing feature vectors of the input images. Semantic similarity calculations using distance measure is the way to go when we do not have labeled(or partially labeled) data with a very large number of objects to classify/detect. The similarity metrics can be used to compare and identify unseen categories when the data evolves. i.e If it walks like a duck and quacks like a duck, we prefer to infer it is a duck even if our training data had never seen a duck.
Temperature is Nothing but Measure of Speed of the Particles at Molecular Scale - Intro 2 Maxwell Boltzmann Distribution

Posted January 23, 2022 by Gowri Shankar  ‐  7 min read

The definition for temperature is it is the average kinetic energy of the molecules in the space. If you find the cup of coffee your girlfriend graciously gave you this morning is not hot enough, then you can confidently conclude the molecules in the coffee pot are as lazy as you are. When the particles in the space are active, bumping into each other and have a commotion to prove their existence, we can call they are hot. What makes one hot is directly proportional to the number of particles in their space of influence traipse from a steady-state to a hyperactive one. Often these particles move aimlessly that we witness while boiling water or cooking food. This phenomenon can be understood quite clearly via Maxwell-Boltzmann distribution which is a concept from Statistical Physics/Mechanics having significant importance in machine learning and cognitive science.
The Best Way to Minimize Uncertainty is NOT Being Informed, Surprised? Let us Measure Surprise

Posted January 14, 2022 by Gowri Shankar  ‐  6 min read

Ignorance is bliss. We all know there is a deeper meaning to this phrase from a philosophical context that points towards lethargic attitude. I would like to define the word ignorance as a lack of knowledge or information. Often we believe the more information we have, the more we are certain about the past, present, and future events associated with that information. Information theory differs significantly on that belief, Thanks to Claude Shannon. i.e. the more the information we have, the more we fill the uncertainty bucket that we detest. Is there any fun in knowing that an event is absolutely certain to happen? for example, Proteas won the series(Cricket) against India. The improbable state of events brings more information which is the cause for all surprises to keep us sitting on the edge of the seat. Test cricket - Game of glorious uncertainties after all..! Hence, we shall learn more about surprises especially measuring surprises.
3rd Wave in India, Covid Debacle Continues - Let us Use Covid Data to Learn Piecewise LR and Exponential Curve Fitting

Posted January 7, 2022 by Gowri Shankar  ‐  8 min read

Deep neural networks models are dominant in their job compared to any other algorithms like support vectors machines or statistical models that are celebrated once. When it comes to big data, without a doubt deep learning models are the defacto choice for convergence. I often wonder what must be making them so efficient, something should be quite obvious and provable. Activation functions, we know activation functions bring in non-linearity to the network layers through the neurons and they do the magic in vogue. ReLU, Sigmoid, and their sister-in-law gang are the piecewise linear functions that create non-linearity to the outcomes. i.e. the activation functions help the neural networks to slice and dice the input space into finer grains and form locally sensitive hash tables. A piecewise linear function in the network can be visualized as a polyhedron(or a cell) with sharp edges is the fundamental building block for achieving convergence in DNNs.
Ever Wondered that Voice is from a Human Being or Created by a Computer? Revisiting Normalizing Flows

Posted December 31, 2021 by Gowri Shankar  ‐  8 min read

We often consider everything is normal when we see the creations of nature. This is because almost everything looks similar to the instances of the entity it inherits from. For example, a dog looks like its fellow doggies so as the trees, birds, and human beings. However, on closer observation and further inspection, we find the uniqueness of each instance - That is the true beauty of the creation. When the instances vastly differ from their counterparts we tend to call them differently-abled or specially powered, if not super-powered. i.e, Nature is a huge probabilistic model with a repository of diverse distributions and we are all samples of a particular distribution which is not a normal distribution that makes us who we are. Hence, nothing is normal and everything is special. What these distributions are, why they are special and how we can create them - the concept of normalizing flows gives us some light.
Transformers Everywhere - Patch Encoding Technique for Vision Transformers(ViT) Explained

Posted December 24, 2021 by Gowri Shankar  ‐  8 min read

Today, we are witnessing transformer architecture making the breakthrough in almost all the AI challenges because of its domain agnostic nature and simplicity. For language models, transformers are the single stop default solution for convergence, then transformers are applied for time series forecasting(Temporal Fusion Transformers). The key concepts of self-attention and positional encoding make the fundamental building blocks of transformer architecture, that can be extended to images as well. Dosovitsky, Kolesnivkov et al of Google has demonstrated that an image can be represented completely with [16 x 16] words in their 2021 paper titled _An image is worth 16 x 16 Words: Transformers for Image Recognition at Scale_. That means the most reliable CNN architecture has a partner now to take computer vision solutions to their new heights.
Understanding Self Attention and Positional Encoding Of The Transformer Architecture

Posted December 17, 2021 by Gowri Shankar  ‐  9 min read

The purpose of transformers architecture in deep learning AI models is to perform the transduction of one sequence of symbols into another. Transformers are nothing but clever utilization of matrix multiplication to infer the outcomes. They become popular due to their simplicity and a powerful replacement that answers the vanishing gradient issues of recurrent neural network models like LSTM(Long Short Term Memory) and GRU(Gated Recurrent Units). Often the most simple and admiring things that nature bestow upon us are the most mysterious things to comprehend when we dive deeper. Transformers fall into those categories of simple, elegant, trivial at face value but require superior intuitiveness for complete comprehension. Two components make transformers a SOTA architecture when they first appeared in 2017. First, The idea of self-attention, and Second, the Positional Encoding. Where attention mechanism is built quite clearly inspired by the human cognitive system and the positional encoding is purely a mathematical marvel.
TFRS for DLRMs At Enterprise Scale - A Practical Guide to Understand Deep and Cross Networks

Posted December 10, 2021 by Gowri Shankar  ‐  10 min read

Feature engineering is a non-trivial and critical activity that we perform while designing and building machine learning models that are meant to recommend outcomes for the end-users. Feature engineering is often conducted manually with the support of a few critical statistical techniques or by doing an exhaustive search. The core objective of feature engineering is to identify statistically significant variables and learn their implicit interactions. We celebrate deep learning algorithms because they do an extraordinary job learning and approximating any continuous function. If they can learn any continuous polynomial function, will they learn interactions between features(higher degrees of polynomials) and make our lives easier in identifying the statistically significant variables and their combinations? Yes, they do learn through a novel technique called Deep and Cross Networks(DCN) which is efficient in learning certain bounded-degree feature interactions.
Quotient-Remainder Embedding, Dealing With Categorical Data In A DLRM - A Paper Review

Posted December 3, 2021 by Gowri Shankar  ‐  7 min read

The meaning of Dequantization from most of the English dictionaries is the restoration of missing details from an image or text or any entity that is stored and represented digitally. It is de+quantization, where Quantization is the process of mapping continuously infinite values to a smaller set of discrete finite values. In layman's terminology, everything that comes under the laws of nature and physics is continuous and we quantize them to a smaller set of numbers for convenience to store and retrieve. By the way, the term quanta mean countable or quantifiable. When the discretely represented natural phenomenons have no inherent inter-relationships, we call them categorical values. Categorical values are broadly classified into two categories, ordinals and nominal. Dealing with categorical values in deep learning recommendation models(DLRMs) is not straight forward and the idea of Embedding helps us to solve this problem.
That Straight Line Looks a Bit Silly - Let Us Approximate A Sine Wave Using UAT

Posted November 29, 2021 by Gowri Shankar  ‐  4 min read

This is the continuation of my first post on the Universal Approximation Theorem. My previous post took a simple case of approximating a leading straight line and in this post, we are approximating a sinewave using numpy that is smoothed using a Gaussian Filter.
Do You Know We Can Approximate Any Continuous Function With A Single Hidden Layer Neural Network - A Visual Guide

Posted November 27, 2021 by Gowri Shankar  ‐  11 min read

Ok, that is neither true nor false. Theoretically speaking, a single hidden layer neural network can approximate any continuous function in the 1-d space with few caveats like a. fail to generalize b. no learnability and c. impossibly large layer size. However, there is a guarantee that neural networks can approximate any continuous function for every possible input whether they are single input ones or multiple inputs. There is a universality when it comes to neural networks. This universal property of neural networks makes deep learning models work reasonably well for almost any complex problem. We are in the early stage of deep learning development, with current evolution we are generating text descriptions for image input, translating Swahili sentences into Japanese equivalents, create faces that never exist before. In this post, we shall study the nuances of the Universal Approximation Theorem for Neural Networks, a fundamental property of deep learning systems in detail.
Deep Learning is Not As Impressive As you Think, It's Mere Interpolation

Posted November 24, 2021 by Gowri Shankar  ‐  14 min read

This post is a mere reproduction(with few opinions of mine) of one of the interesting discussions of Deep Learning focusing on interpolation/extrapolation in Twitter. The whole discussion was started because of an interesting reply from Dr.Yann LeCun to Steven Pinker who made an appreciation note to Andre Ye's post titled - You Don’t Understand Neural Networks Until You Understand the Universal Approximation Theorem.
Survival Analysis using Lymphoma, Breast Cancer Dataset and A Practical Guide to Kaplan Meier Estimator

Posted November 20, 2021 by Gowri Shankar  ‐  9 min read

I live in the hills of Kumaon, the Himalayas a region of biodiversity, scenic beauty, and fertile landscapes situated to the west of Nepal. This region is full of fruit orchards that thrive due to the conducive condition and the fertile soil. An interesting aspect that caught my attention is the farmers allowing frugivores in their orchards to consume their produce that results in an effective seed-dispersing scheme. Most of the frugivores have a specialized digestive system to process fruits and leave the seeds intact from their gut. The quest is how long the farmers have to leave the fleshy fruits in their orchards so that the animals consume them for an effective seed-dispersing process. Statistics have the answer, Survival analysis is a methodology widely used in medical research to measure the probability of patients living after a certain amount of time after the treatment for a disease. It comes under the medical prognosis scheme of healthcare. Using survival analysis we can find how long the fleshy fruits have to remain in the tree for the frugivores to consume.
Dequantization for Categorical Data, Categorical NFs via Continuous Transformations - A Paper Review

Posted November 13, 2021 by Gowri Shankar  ‐  9 min read

Of late we handle and store almost all of the information that humanity creates in digital format, that is in-silico bits of the discrete order. However, every aspect of nature and the laws that govern nature are continuous. When we say all the information, we truly did not mean ALL the information but the information we believe that is relevant. Also, that information that we are capable of capturing. If that is confusing, we need infinite energy and storage for all the confounders of an event that we do not have. Someone naively said a butterfly flapping its wings can cause a typhoon but there is a small shred of wisdom in it, small events do serve as catalysts that act on starting conditions. We cannot capture and store all those high dimensional events but we can study the deep distributions caused by them by transforming from discrete space to continuous space. The process of casting encodings of categorical data from the discrete space to continuous space is called dequantization, this process allows us to create flexible distributions of high dimensional data to build robust machine learning models.
Bijectors of Tensorflow Probability - A Guide to Understand the Motivation and Mathematical Intuition Behind Them

Posted November 7, 2021 by Gowri Shankar  ‐  11 min read

A bijector is a function of a tensor and its utility is to transform one distribution to another distribution. Bijectors bring determinism to the randomness of a distribution where the distribution by itself is a source of stochasticity. For example, If you want a log density of distribution, we can start with a Gaussian distribution and do log transform using bijector functions. Why do we need such transformations, the real world is full of randomness and probabilistic machine learning establishes a formalism for reasoning under uncertainty. i.e A prediction that outputs a single variable is not sufficient but has to quantify the uncertainty to bring in model confidence. Then to sample complex random variables that get closer to the randomness of nature, we seek the help of bijective functions.
Metrics That We Measure - Measuring Efficacy of Your Machine Learning Models

Posted October 30, 2021 by Gowri Shankar  ‐  7 min read

Have we identified the perfect metrics to measure the efficacy of our machine learning models? A perfect metric - does that even exist. A recent feed from LinkedIn on measuring metrics caught my attention, It is a bit opinionated claim from the author with substantial shreds of evidence and arguments. His post drew attention from many and made it a valuable repository of information and views from diverse people. This post summarizes diverse responses from the participants of the post.
Gaussian Process and Related Ideas To Kick Start Bayesian Inference

Posted October 24, 2021 by Gowri Shankar  ‐  7 min read

When we try to fit a curve/line by establishing relationships between variables, we are defining a non-linear function of polynomial nature. This function describes the variables of our interest i.e, prediction with the underlying parametric approach. The line or curve fitting scheme results in a single point on the space that describes the outcome for a specific relationship among the predictors. This scenario is tangential to the way we approach real-world problems. Human beings often add an uncertainty attribute to the outcomes when the confounders are unknown but never an absolute value. For example, we reach our destination between 8-10 hours if the traffic is not heavy. When we bring in the uncertainty measure in the form of 8-10 hrs and traffic density instead of pointing absolute number confidence and trust for the model increases from the end-user point of view. Such models employ a Bayesian non-parametric strategy to define the underlying unknown function that is commonly called Gaussian Process models.
Istio Service Mesh, Canary Release Routing Strategies for ML Deployments in a Kubernetes Cluster

Posted October 16, 2021 by Gowri Shankar  ‐  13 min read

Change is the only constant thing in this universe. Our data changes and cause data drift then the understanding of the nature change and cause concept drift. However, we believe building State of the Art(SOA), One of a Kind(OAK), and First, of its Time(FOT) in-silico intelligence will achieve a nirvana state and juxtapose us next to the hearts that are liberated from the cycle of life and death. Constructing a model is just the end of the inception, real trials of difficulty and the excruciating pain of managing changes are awaiting us. Shall we plan well ahead by having a conscious focus on a minimum viable product that promises a quicker time to market with a fail-fast approach? Our ego doesn't allow that because we do not consider software development is cool anymore, we believe building intelligence alone makes us deserving our salt. Today anyone can claim themselves a data scientist because of 2 reasons. Until 2020 we wrote SQL queries for existence. It is 2021 - Covid bug bit and mutated us, we survived variants and waves that naturally upgraded the SQL developer within to a data scientist(evolutionary process). Reason 2 - With all due respect to one man Dr.Andrew Ng, with his hard work and perseverance, made us believe we are all data scientists. By the way, they say ignorance is bliss and we can continue building our SOA, OAK, and FOT models forever at the expense of someone's cash. BTW, Anyone noticed Andrew is moving away from the model-centric AI to the data-centric AI - He is a genius and he will take to the place we truly belong.
Atoms and Bonds 2 - ML for Predicting Quantum Mechanical Properties of Organic Molecules

Posted October 8, 2021 by Gowri Shankar  ‐  10 min read

It is enthralling to see machine learning algorithms solve core science problems, It enables us to revisit favorite subjects after years and for few even decades. Like any other field, ML had a humble beginning by detecting cats, dogs, and their respective mothers-in-law. Drug discovery is a prolonged and pricey process, Pharmaceutical firms research with two kinds of molecules to increase the efficacy of the drug in its entirety. One is the source molecule(the drug), the other is the target molecule where the drug has to act upon, also the target molecules have their peripheral molecules to act upon. The quest is to predict the biochemical activity(atomization) between the compounds quantitatively and qualitatively for the cure with no side effects. Machine learning algorithms help in the process of investigating a huge library of chemical compounds and test their biochemical impact on the target molecules.
Atoms and Bonds - Graph Representation of Molecular Data For Drug Detection

Posted October 2, 2021 by Gowri Shankar  ‐  15 min read

In computer science, a graph is a powerful data structure that embodies connections and relationships between nodes via edges. A graph illustration of information strongly derives its inspiration from nature. We find a graph or graph-like formations everywhere in nature, from bubble foams to crushed papers. E.g. the cracked surfaces of a dry riverbed or a lake-bed during the dry season is a specialized topological design of graph data structure called Gilbert Tesselations. Soap bubbles and foams form double layers that separate the films of water from pockets of air made up of complex forms of curved surfaces, edges, and vertices. They form the bubble clusters and these clusters are represented as Mobius-invariant power diagrams, one another special kind of graph structure. Conventional DL algorithms restrict themselves to the tabular or sequential representation of data and lose their efficacy. However, Message Passing NN architecture(MPNN) is a Graph Neural Network(GNN) scheme where we can input graph information without any transformations. MPNNs are used in the field of drug detection and it inspired we can model molecular structures for penetrating blood-brain barrier membrane.
Graph Convolution Network - A Practical Implementation of Vertex Classifier and it's Mathematical Basis

Posted September 25, 2021 by Gowri Shankar  ‐  10 min read

Traditional deep learning algorithms work in the Euclidean space because the dataset is transformed and represented in one or two dimensions. This approach results in loss of information, especially on the relationship between two entities. For example, the network organization of the brain suggests that the information is stored in the neuronal nexus. i.e Neurons fire together, wire together - Hebbian Theory. The knowledge of togetherness or relationships can be ascertained strongly in the non-Euclidean space in the form of Graphs Networks. Such intricate graph networks are evolved to maximize efficiency and efficacy in the form of information transfer at a minimum cost(energy utilization) to accomplish complex tasks. Though the graph networks solve the spatial challenges to certain extents, temporal challenges are yet to be addressed. Extending DNN theories with the graph is the current trend, e.g. an image can be considered as a specialized graph where the pixels have relation to their adjacent ones, to perform a Graph Convolution.
Introduction to Graph Neural Networks

Posted September 19, 2021 by Gowri Shankar  ‐  9 min read

Information stored and fed to deep-learning systems are either in the tabular format or in the sequential format, this is because of our antiquated way of storing data in relational database design inspired by pre-medieval accounting systems. Though the name has the word relation, the actual relationships are established independent of the data(e.g. across tables through P/F keys). This is an un-intuitive and in-efficient way of representation that guarantees convenience for a computer programmer's comprehension but not the needs of the machine-assisted, data-driven lifestyle of today. The inherent nature of the human cognitive system is the ability to comprehend the relationship and store them as relationship (graphically or hierarchically) ensures supremacy in the creation of ideas, retrieval of memories, modification to beliefs, and removal of dogmas(arguably). On contrary, current (leading) approaches in data storage are tabular or linear - could be the cause for inefficiency in achieving convergence despite the consumption of very high energy(compared to the animal brain) to achieve simple tasks. I spent some time with graphs, graph neural networks(GNN), and their architecture to arrive at the above intuition. I believe GNNs are bringing us a little closer to building human-like intelligent systems inspired by the human way of storing information.
Relooking Attention Models, SHA-RNN Overview, gMLP Briefing and Measuring Efficacy of Language Models through Perplexity

Posted September 11, 2021 by Gowri Shankar  ‐  10 min read

Satire and sarcasm are seldom seen in scientific writing but this is an era of memes and trolls where complex concepts are conveyed through highly comprehensible mediums(videos, animations, etc). When it comes to being critical(without hurting) about a concept or a character, sarcasm is taken as a medium in the literary renditions but seldom do we see them in the scholarly scriptures. Such sui generis is convivial and fervent for the patrons - Stephen Merity's 2019 paper titled Single Headed Attention RNN: Stop Thinking With Your Head(SHA-RNN) is one such scholarly writing where he is critical about today's (leading) approaches in language modeling especially our obsession towards the Attention Models without demonstrating outrage or distress. His paper is lucid and takes us back to celebrate the glory of yesteryear's multi-layer perceptrons. A more recent paper(Jun 2021) from Google titled Pay Attention to MLPs(gMLP) periphrastically confirms Stephen's claims with substantial empirical proof.
Gossips and Epicenter of Emotions for our Existence - Study on Anatomy of Limbic System

Posted September 5, 2021 by Gowri Shankar  ‐  9 min read

Cognitive and communicative systems of human brain did not evolve to build mathematical models or to find philosophical insights during our primitive times - They evolved so that we can gossip. Gossipping is the fundamental attribute of human beings that made us who we are. Gossips enabled men to create languages, cultures and civilizations, do you know for what? to impress our respective girlfriends(and boyfriends, obviously) and partners - makes it the epicenter for our existence. Why do I call Gossips the epicenter of our existence - Through gossips we conveyed our emotions and feelings that are seldom spoken openly, however spoken definitely that changed the course of our history - from Helen of Troy to Monica Lewinsky, we gossipped and gossipped until the reign brought down to its knees(pun intended). One single act called gossip designed the destiny of humanity by producing flavors of emotions in the human brain, specifically in the limbic region. That region has to be studied because our quest is to build human like intelligence on a silicon wafer and emotions are the prime factor that makes a human human.
Methodus Fluxionum et Serierum Infinitarum - Numerical Methods for Solving ODEs Using Our Favorite Tools

Posted August 28, 2021 by Gowri Shankar  ‐  8 min read

Wikipedia says, a differential equation is an equation that relates to one or more functions and their derivatives. In layman's term, the only constant in this life(universe) is change and any entity that is capable of adapting to change especially threats and adversarial ones thrived and flourished - Hence we are interested in studying the change and the rate at which the change occurs. Uff, that is too layman-ish definition for differential equations even for an unscholarly writer of my kind. Apparently, Newton called those functions fluxions, Gottfried Wilhelm Leibniz independently identified them are all history - they made differential equations a compelling topic for understanding the nature. Further, numerical analysis is a way to solve equations of algebraic order, they are quite the functions of convergence in the quest for achieving intelligence(artificial).
A Practical Guide to Univariate Time Series Models with Seasonality and Exogenous Inputs using Finance Data of FMCG Manufacturers

Posted August 21, 2021 by Gowri Shankar  ‐  10 min read

The definition of univariate time series is, a time series that consists of single scalar observations recorded sequentially over equal periodic intervals. i.e An array of numbers are recorded where time is an implicit dimension represented at constant periodicity. Univariate time series models(UTSM) are the simplest models that allow us to forecast the future values by learning the patterns in the sequence of observations recorded. The key elements of these patterns are Seasonality, Trends, Impact Points and Exogenous Variables. There are 3 schemes of pattern identification acts as building block for UTSMs, they are auto regression(OLS), moving averages and seasonality - When they augmented with external data, effectiveness of the model improves significantly.
Trend, Features of Structural Time Series, Mathematical Intuition Behind Trend Analysis for STS

Posted August 14, 2021 by Gowri Shankar  ‐  6 min read

Decomposability is the prime factor for the success of Generalized Additive Models in the quest for forecasting future events from the observed dataset. When we design a time series forecasting model, the functional features that we often observe from the data are trend, seasonality and impact points. The decomposable nature of these features from the dataset makes the problem conducive to model individual features independently by considering it as a curve fitting exercise. i.e. Modeling trend independently from other features makes the outcomes interpretable and subsequently paves way for advantages like bringing analyst in the loop in attaining convergence. This approach ignores the explicit dependence of temporal structure in the data that is a common function in generative models like ARIMA.
Fourier Series as a Function of Approximation for Seasonality Modeling - Exploring Facebook Prophet's Architecture

Posted August 8, 2021 by Gowri Shankar  ‐  8 min read

Generalized additive models are the most powerful structural time series models, forecast the horizons by identifying confounding characteristics in the data. Among those characteristics Seasonality is a common one observed in almost all time series data. Understanding and Identifying periodic(hourly, daily, monthly or something esoteric) occurrence of events and actions that impacts the outcome is an art that requires domain expertise. Fourier series a periodic function with its composition of harmonically related sinusoids that are combined by a weighted summations helps us in approximating an arbitrary function.
Structural Time Series Models, Why We Call It Structural? A Bayesian Scheme For Time Series Forecasting

Posted August 1, 2021 by Gowri Shankar  ‐  7 min read

Models we build are the machines that has fundamental capability to learn the underlying patterns in the observed data and store them in the form of weights. Patterns reside in different forms, shapes and sizes - this is ubiquitous because we are interpreting the universe through our observed data. When the observed data exhibit periodic patterns - we call them time series. The key challenge with time series data is the missing values and absence of confounders - makes them special. Further the problem gets more interesting when we approach time series forecasting in a Bayesian setup. This is a new series of posts I am starting with Structural Time Series(STS) where we explore a wide gamut of problems and approaches to declutter the underlying treasure.
Blind Source Separation using ICA - A Practical Guide to Separate Audio Signals

Posted July 24, 2021 by Gowri Shankar  ‐  6 min read

In this post we shall perform step by step implementation of blind source separation using independent component analysis. This is an end to end attempt to demonstrate a solution for cocktail party problem where we believe data observed from the nature is always a mixture of multiple distinct sources, identifying the source signal is critical for understanding the nature of the observed data. CAUTION: This page plays a music clip for 10 seconds while opening.
Cocktail Party Problem - Eigentheory and Blind Source Separation Using ICA

Posted July 18, 2021 by Gowri Shankar  ‐  13 min read

We will never achieve 100% accuracy on our predictability of real world events using any AI/ML algorithm and accuracy is a one simple metric that always lead to deception, Why? Data observed from the nature is always a mixture of multiple distinct sources, separating them by their origin is the basis for understanding. The process of separating the signals that consummate an observed data is called Blind Source Separation. Pondering, we human beings are creatures of grit and competence to come up with techniques like Independent Component Analysis(ICA) in the quest for understanding the complex entities of nature.
Courage and Data Literacy Required to Deploy an AI Model and Exploring Design Patterns for AI

Posted July 10, 2021 by Gowri Shankar  ‐  18 min read

Have you ever come across a situation where your dataset is closely linked with human beings and you are expected to optimize certain operations/processes. Does it made you feel anxious? You are not alone, operational optimizations at industrial/business processes are often focused towards minimizing human errors to maximize productivity/profitability - Most likely, depend on machines(to support) rather than fully rely on humans in decision making. These decisions might exacerbate the basic livelihood of certain section of people(often the ones in the bottom of the value chain) involved in the process, if AI is done wrongly.
Eigenvalue, Eigenvector, Eigenspace and Implementation of Google's PageRank Algorithm

Posted July 3, 2021 by Gowri Shankar  ‐  8 min read

Feature extraction techniques like Principal Component Analysis use eigenvalues and vectors for dimensionality reduction in a machine learning model by density estimation process through eigentheory. Eigenvalues depicts the variance of distribution of data in certain direction, the vector having the highest eigenvalue is the principal component of the feature set. In simple terms, eigenvalues helps us to find patterns inside a noisy data. By the way, Eigen is a German word and it means Particular or Proper - When it combined with value, it means - the proper value.
Need For Understanding Brain Functions, Introducing Medical Images - Brain, Heart and Hippocampus

Posted June 26, 2021 by Gowri Shankar  ‐  11 min read

Inspiration for an idea or an information often comes to the creator through divine influences, the great mathematician Srinivasa Ramanujan credits his family deity Namagiri for his mathematical genius. I believe the human brain structure and functions are the significant influencers for designing vision, speech and nlp systems of current kind. Understanding and in-silico reconstruction of neuronal circuits, behaviors and responses at the level of individual neurons and at the level of brain regions is critical for achieving superior intelligence.
Attribution and Counterfactuals - SHAP, LIME and DiCE

Posted June 19, 2021 by Gowri Shankar  ‐  10 min read

Why a machine learning model makes certain predictions/recommendations and what is the efficacy of those predicted outcomes wrt the real world is a deep topic of research. i.e What is the cause for a model to predict certain outcome. There are 2 popular methods our researchers had devised, Attribution based and Counterfactuals(CF) based schemes for model explanation. Attribution based methods provides scores for features and CFs generate examples from an alternate universe by tweaking few parameters of the input features.
Is Covid Crisis Lead to Prosperity - Causal Inference from a Counterfactual World Using Facebook Prophet

Posted June 12, 2021 by Gowri Shankar  ‐  12 min read

Identifying one causal reason is more powerful than identifying dozens of correlational patterns from the data, causal inferencing is a branch of statistics concern to effects that are consequence of actions. In traditional machine learning, we infer from the observations of the past asking how something had happened by characterizing the association between variables. On contrary, causal inferencing addresses why an event had happened through randomized experiments.
La Memoire, C'est Poser Son Attention Sur Le Temps

Posted June 5, 2021 by Gowri Shankar  ‐  10 min read

Powerful DNN architectures(MLPs, CNNs etc) fail to capture the temporal dependencies of the real world events. They are limitted to classifying the data by learning from the probability distribution of fixed length vectors(images). However, real world problems are function of time where the past events have significant impact on the current and future outcomes. Hence comes the simple but most powerful mechanism of attention and memory methods, inspired from the human cognitive system.
Normalizing Flows - A Practical Guide Using Tensorflow Probability

Posted May 29, 2021 by Gowri Shankar  ‐  9 min read

There are so many amazing blogs and papers on normalizing flows that lead to solving density estimation problems, this is yet another one. In this post, I am attempting to implement a flow based density transformation scheme that can be used for a generative model - We have a hands on coding session with supporting math. The most fascinating thing about flow based models are their ability to explicitly learn the data distribution through sequence of invertible transformations. Let us build a set of sophisticated transformations using Tensorflow Probability.
Why Covariance Matrix Should Be Positive Semi-Definite, Tests Using Breast Cancer Dataset

Posted May 23, 2021 by Gowri Shankar  ‐  8 min read

Are you keep hearing this phrase Covariance Matrix is Positive Semidefinite when you indulge in deep topics of machine learning and deep learning especially on the optimization front? Is it causing certain sense of uneasiness and makes you feel anxious about the need for your existence? You are not alone, In this post we shall see the properties of a Covariance Matrix. Also, we shall see the nature of eigen values for a covariance matrix.
Calculus - Gradient Descent Optimization through Jacobian Matrix for a Gaussian Distribution

Posted May 15, 2021 by Gowri Shankar  ‐  12 min read

Back to basics, in machine learning cost functions determines the error between the predicted outcomes and the observed values. Our goal is to minimize the loss i.e error over a single training sample calculated for the entire dataset iteratively to achieve convergence. It is like descending from a mountain by making optimal downward steps to reach the deepest point of the valley called global minima. In this post we shall optimize a non-linear function using calculus without any sophisticated libraries like tensorflow, pytorch etc.
With 20 Watts, We Built Cultures and Civilizations - Story of a Spiking Neuron

Posted May 9, 2021 by Gowri Shankar  ‐  13 min read

Our quest is to build human like AI systems that takes inspiration from the brain and imitate its memory, reasoning, feelings and learning capabilities within a controlled setup. It's 500 million years long story of evolution and optimization at cellular level. Today human brain consumes ~20W of power to run the show, with such an efficient machine humanity built cultures and civilizations. This evolutionary story shaping the development of deep learning systems inspiring us to think beyond the horizons of current comprehension.
Causal Reasoning, Trustworthy Models and Model Explainability using Saliency Maps

Posted May 2, 2021 by Gowri Shankar  ‐  9 min read

Correlation does not imply causation - In machine learning, especially deep neural networks(DNN) we are not evolved to confidently identify cause and their effects, learning agents learn from the probability distributions. In statistics, we accept and reject hypotheses to arrive at a tangible decisions, a similar kind of causal inferencing is key to the success of complex models to avoid false conclusions and consequences.
Higher Cognition through Inductive Bias, Out-of-Distribution and Biological Inspiration

Posted April 24, 2021 by Gowri Shankar  ‐  12 min read

The fascinating thing about human(animal) intelligence is its ability to systematically generalize things outside of the known distribution on which it is presumably trained. Instead of having huge list of hypothesis and heuristics, if intelligence can be explained with few principles - understanding intelligence and building intelligent machines will take an inspiring and evolutionary path.
Information Gain, Gini Index - Measuring and Reducing Uncertainty for Decision Trees

Posted April 17, 2021 by Gowri Shankar  ‐  9 min read

This is the 5th post on the series that declutters entropy - the measure of uncertainty. In this post, we shall explore 2 key concepts Information Gain and Gini Impurity which are used to measure and reduce uncertainty. We take Heart Disease dataset from UCI repository to understand information gain through decision trees
KL-Divergence, Relative Entropy in Deep Learning

Posted April 10, 2021 by Gowri Shankar  ‐  5 min read

This is the fourth post on Bayesian approach to ML models. Earlier we discussed uncertainty, entropy - measure of uncertainty, maximum likelihood estimation etc. In this post we are exploring KL-Divergence to calculate relative entropy between two distributions.
Shannon's Entropy, Measure of Uncertainty When Elections are Around

Posted April 3, 2021 by Gowri Shankar  ‐  6 min read

What is the most pressing issue in everyone's life, It is our inability to predict how things will turn out. i.e. Uncertainties, How awesome(or depressing) it will be if we make precise predictions and perform accurate computation to measure uncertainties.
Bayesian and Frequentist Approach to Machine Learning Models

Posted March 27, 2021 by Gowri Shankar  ‐  5 min read

Rev. Thomas Bayes discovered the theorem for conditional probability that bears his name and forms the basis for Bayesian Statistical methods. Sir Ronald Fisher is considered one of the founders of frequentist statistical methods and originally introduced maximum likelihood.
Understaning Uncertainty, Deterministic to Probabilistic Neural Networks

Posted March 19, 2021 by Gowri Shankar  ‐  8 min read

Uncertainty is a condition where there is limited or no knowledge about the existing state and impossibility to describe future outcome/outcomes. The essential nature of existence is driven by constant change that lead to quest for knowledge in the minds of the seeker.
Understanding Post-Synaptic Depression through Tsodyks-Markram Model by Solving Ordinary Differential Equation

Posted March 12, 2021 by Gowri Shankar  ‐  9 min read

Understanding the building blocks of the brain and its responsive nature is always a frontier for conquest and fascinating area of research. In this post, let us explore the temporal data acquired from the somatic recordings that explains short term synaptic plasticity strongly affects the neural dynamics of neocortical networks.
Automatic Differentiation Using Gradient Tapes

Posted December 14, 2020 by Gowri Shankar  ‐  9 min read

As a Data Scientist or Deep Learning Researcher, one must have a deeper knowledge in various differentiation techniques due to the fact that gradient based optimization techniques like Backpropagation algorithms are critical for model efficiency and convergence.
Roll your sleeves! Let us do some partial derivatives.

Posted August 14, 2020 by Gowri Shankar  ‐  3 min read

In this post, we shall explore a shallow neural network with a single hidden layer and the math behind back propagation algorithm, gradient descent
GradCAM, Model Interpretability - VGG16 & Xception Networks

Posted July 4, 2020 by Gowri Shankar  ‐  11 min read

The objective of this post is to understand the importance of Visual Explanations for CNN based large scale Deep Neural Network Models.
Tensorflow 2: Introduction, Feature Engineering and Metrics

Posted April 4, 2020 by Gowri Shankar  ‐  27 min read

Introducing TF2 through Train, Test, Valid splitting, Imputation, Bias/Overfit handlers, One Hot Encoding, Embeddings, Tensor Slices, Keras APIs, metrics including accuracy, precision and ROC curve
Time and Space Complexity - 5 Governing Rules

Posted February 28, 2020 by Gowri Shankar  ‐  9 min read

How to approach compute complexities, ie time and space complexity problems while designing a software system to avoid obvious bottlenecks in an abstract fashion.
ResNet50 vs InceptionV3 vs Xception vs NASNet - Introduction to Transfer Learning

Posted June 28, 2019 by Gowri Shankar  ‐  22 min read

Transfer learning is an ML methodology that enables to reuse a model developed for one task to another task. The applications are predominantly in Deep Learning for computer vision and natural language processing.