Frequentist or Bayesian, Who am I?

I am a Software Architect and an Independent Researcher who has designed and developed data products from Ideation to Go To Market at enterprise scale through my career. I am a perpetual learner who learn new things and make them work. My passion is Programming and Mathematics for Deep Learning and Artificial Intelligence. My focus area is Computer Vision and Temporal Sequences for Prediction and Forecasting.

Selected Reads Selected Watch More About Me

Selected Writes - AI, ML, Math

Airflow Trigger Rules for Building Complex Data Pipelines Explained, and My Initial Days of Airflow Selection and Experience

Posted May 1, 2022 ‐ 9 min read

Dell acquiring Boomi(circa 2010) was a big topic of discussion among my peers then, I was just start shifting my career from developing system software, device driver development to building distributed IT products at enterprise scale. I was so ignorant and questioned, 'why would someone pay so much for a piece of code that connects systems and schedules events'. I argued that those data pipeline processes can easily built in-house rather than depending on an external product. To understand the value of an integration platform or a workflow management system - one should strive for excellence in maintaining and serving reliable data at large scale. Building in-house data-pipelines, using Pentaho Kettle at enterprise scale to enjoying the flexibility of Apache Airflow is one of the most significant parts of my data journey.

Introduction to Contrastive Loss - Similarity Metric as an Objective Function

Posted January 30, 2022 ‐ 6 min read

My first machine learning work was based on calculating the similarity between two arrays of dissimilar lengths. The array items represent features of handwritten characters extracted from a 2D vector captured using an electronic pen at a certain frequency, Circa 2001. The fundamental idea behind the similarity calculator is the measure of Euclidean distance between feature vectors of the corpus and the observed character strokes. Then came the most famous Siamese neural network architecture(~2005) that has two or more identical networks with the same parameters and weights that measure the similarity by comparing feature vectors of the input images. Semantic similarity calculations using distance measure is the way to go when we do not have labeled(or partially labeled) data with a very large number of objects to classify/detect. The similarity metrics can be used to compare and identify unseen categories when the data evolves. i.e If it walks like a duck and quacks like a duck, we prefer to infer it is a duck even if our training data had never seen a duck.

Temperature is Nothing but Measure of Speed of the Particles at Molecular Scale - Intro 2 Maxwell Boltzmann Distribution

Posted January 23, 2022 ‐ 8 min read

The definition for temperature is it is the average kinetic energy of the molecules in the space. If you find the cup of coffee your girlfriend graciously gave you this morning is not hot enough, then you can confidently conclude the molecules in the coffee pot are as lazy as you are. When the particles in the space are active, bumping into each other and have a commotion to prove their existence, we can call they are hot. What makes one hot is directly proportional to the number of particles in their space of influence traipse from a steady-state to a hyperactive one. Often these particles move aimlessly that we witness while boiling water or cooking food. This phenomenon can be understood quite clearly via Maxwell-Boltzmann distribution which is a concept from Statistical Physics/Mechanics having significant importance in machine learning and cognitive science.

The Best Way to Minimize Uncertainty is NOT Being Informed, Surprised? Let us Measure Surprise

Posted January 14, 2022 ‐ 6 min read

Ignorance is bliss. We all know there is a deeper meaning to this phrase from a philosophical context that points towards lethargic attitude. I would like to define the word ignorance as a lack of knowledge or information. Often we believe the more information we have, the more we are certain about the past, present, and future events associated with that information. Information theory differs significantly on that belief, Thanks to Claude Shannon. i.e. the more the information we have, the more we fill the uncertainty bucket that we detest. Is there any fun in knowing that an event is absolutely certain to happen? for example, Proteas won the series(Cricket) against India. The improbable state of events brings more information which is the cause for all surprises to keep us sitting on the edge of the seat. Test cricket - Game of glorious uncertainties after all..! Hence, we shall learn more about surprises especially measuring surprises.

3rd Wave in India, Covid Debacle Continues - Let us Use Covid Data to Learn Piecewise LR and Exponential Curve Fitting

Posted January 7, 2022 ‐ 8 min read

Deep neural networks models are dominant in their job compared to any other algorithms like support vectors machines or statistical models that are celebrated once. When it comes to big data, without a doubt deep learning models are the defacto choice for convergence. I often wonder what must be making them so efficient, something should be quite obvious and provable. Activation functions, we know activation functions bring in non-linearity to the network layers through the neurons and they do the magic in vogue. ReLU, Sigmoid, and their sister-in-law gang are the piecewise linear functions that create non-linearity to the outcomes. i.e. the activation functions help the neural networks to slice and dice the input space into finer grains and form locally sensitive hash tables. A piecewise linear function in the network can be visualized as a polyhedron(or a cell) with sharp edges is the fundamental building block for achieving convergence in DNNs.

Selected Reads - Papers, Articles, Books

Density Estimation using Real NVP - GOOGLE RESEARCH/ICLR

This paper is going to change your perspective on AI research tangentially, if you stepping into Probabilistic DNNs. Start from here for unsupervised learning of probabilistic model using real-valued non-volume preserving transformations. Model natural images through sampling, log-likelihood and latent variable manipulations read...

The Neural Code between Neocortical Pyramidal Neurons Depends on Neurotransmitter Release Probability - PNAS

This 1997 paper brings bio-physics, electro-physiology, neuroscience, differential equations etc in one place. A good starting point to understand neural plasticity, synpases, neurotransmitters, ordinary differential equations read...

Using AI to read Chest X-Rays for Tuberculosis Detection and evaluation of multiple DL systems - NATURE

Deep learning (DL) is used to interpret chest xrays (CXR) to screen and triage people for pulmonary tuberculosis (TB). This study have compared multiple DL systems and populations with a retrospective evaluation of 3 DL systems. read...

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization - IEEE/ICCV

How to approach compute complexities, ie time and space complexity problems while designing a software system to avoid obvious bottlenecks in an abstract fashion. read...

Evolve Your Brain: The Science of Changing Your Mind by Joe Dispenza - BOOK

Ever wonder why you repeat the same negative thoughts in your head? Why you keep coming back for more from hurtful family members, friends, or significant others? read...

Selected Watch - Social Media/OTT Content

Eureka : Dr V. Srinivasa Chakravarthy, Prof, CNS Lab,IITM

Interaction with Prof. Chakra, Head of the Computational Neuroscience Lab. Computational neuroscience serves to advance theory in basic brain research as well as psychiatry, and bridge from brains to machines. watch...

Quantum, Manifolds & Symmetries in ML

Conversation with Prof. Max Welling on Deep Learning with non-Euclidean geometric data like graphs/topology or allowing networks to recognize new symmetries watch...

The Lottery Ticket Hypothesis

Yannic's review of The Lottery Ticket Hypothesis - A paper on network optimization through sub-networks. This paper is from MIT team watch...

Backpropagation through time - RNNs, Attention etc

MIT S191 Introduction to Deep Learning by Alexandar Amini and Ava Soleimany. Covers intuition to Recurrent LSTM, Attention, Gradient Issues, Sequential Modelling etc watch...

What is KL-Divergence?

A cool explanation of Kulbuck Liebler Divergence by Kapil Sachdeva. It declutters many issues like asymmetry, loglikelihood, cross-entropy and forward/reverse KLDs. watch...

Overfitting and Underfitting in Machine Learning

In this video, 2 PhD students are talking about overfitting and underfitting, super important concepts to understand about ML models in an intuitive way. watch...

Attitude ? Explains Chariji - Pearls of Wisdom - @Heartfulness Meditation

Chariji was the third in the line of Raja Yoga Masters in the Sahaj Marg System of Spiritual Practice of Shri Ram Chandra Mission (SRCM). Shri Kamlesh Patel also known as Daaji, is the current Guide of Sahaj Marg System (known today as HEARTFULNESS ) and is the President of Shri Ram Chandra Mission. watch...