Blind Source Separation using ICA - A Practical Guide to Separate Audio Signals

Posted July 24, 2021 by Gowri Shankar ‐ 6 min read

In this post we shall perform step by step implementation of blind source separation using independent component analysis. This is an end to end attempt to demonstrate a solution for cocktail party problem where we believe data observed from the nature is always a mixture of multiple distinct sources, identifying the source signal is critical for understanding the nature of the observed data. CAUTION: This page plays a music clip for 10 seconds while opening.

This post is a continuation of my post on mathematical intuition behind independent component analysis. Please refer,

I have to thank Jonathan Schlens' paper on mathematical intuition behind ICA and Cory Maklin’s implementation of ICA - Without their scholarly work, this practical guide on signal source separation wouldn’t have been possible.

Note: This topic is still not fully clear but clarifies most of the aspects of ICA.

ARR

Objective

Objective of this post is to translate the math behind Cocktail Party Problem aka ICA into a tangible python code. It should be supported by sufficient media content(audio) of source signals for easy comprehension. We experience practical implementation of the following topics

  • Eigentheory
  • Entropy - Measure of uncertainty
  • Orthogonal Diagonalization
  • Mixing audio signals

Introduction

Our fundamental assumption on ICA is the signals are

  • Statistically independent of each others and
  • A Non-Gaussian distribution

In this post, we shall take two audio sources(.wav files) recorded independently and mix linearly to create a mixed signal. Then the mixed signals are fed into our custom ICA implementation(no libraries, just numpy) algorithm to separate them. Idea is to build the ICA implementation incremantally in tandem with the mathematical intuition that we have explored here.

ICA Implementation

With ICA fundamentals and mathematical intuitions are discussed in detail we shall implement the algorithm with no libraries but numpy. Following are the steps we perform to achieve ICA,

  • Data preparation
    • Load the audio files
    • Mix the music
    • Of course, listen the audio sources and the linear mixture
  • Normalizing by centering to mean
  • Whitening the source data
  • Entropy calculation and
  • Convergence scheme

Data Preparation - Music Mix

Mathematical intuition behind linear mixture of signals is defined as follows,

$$\Large \mathbf{x} = \mathbf{A}\mathbf{s} \tag{Linear Mixture}$$ Where,

  • ${ x_1(t), x_2(t), x_3(t), \cdots, x_n(t) }$ where $t$ is the index of the sample(time)
  • $\mathbf{A}$ is an unknown invertible, square matrix that mixes the components of the sources
  • $\mathbf{s} = { s_1(t), s_2(t), s_3(t), \cdots, s_n(t) }$ are the independent sources

Note: Our ultimate goal is to identify $\mathbf{A}$ in certain form by approximation.

Steps:

  1. Load the audio files using wavfile component from SciKit
  2. Mix the sources and stack them along the time dimension $\mathbf{s} = { s_1(t), s_2(t)}$
s1_file = "./talk.wav"
s2_file = "./music.wav"
import numpy as np
np.random.seed(0)

from scipy.io import wavfile as wf

def mix_sources(sources, apply_noise=False):
    for i in range(len(sources)):
        max_val = np.max(sources[i])
        if(max_val > 1 or np.min(sources[i]) < 1):
            sources[i] = sources[i] / (max_val / 2) - 0.5
            
    mixture = np.c_[[source for source in sources]]
    
    if(apply_noise):
        mixture += 0.02 * np.random.normal(size=X.shape)
        
    return mixture

_, s1 = wf.read(s1_file)

sampling_rate, s2 = wf.read(s2_file)
print(s1.shape, s2.shape)
(220568,) (1323000,)
x = mix_sources([s1, s2[:s1.shape[0]]], False)
print(f'Shape of s1: {s1.shape}, s2: {s1.shape}, Linear Mix: {x.shape}')
wf.write('./talk_and_music.wav', sampling_rate, x.mean(axis=0).astype(np.float32))
Shape of s1: (220568,), s2: (220568,), Linear Mix: (2, 220568)

Normalizing and Whitening

We shall normalize the mix by subtracting the mean

$$\Large \mathbf{x}_{norm} = \mathbf{x} - \bar {\mathbf{x}}$$

Whitening is the process through which we remove the correlations between the soure components to result in $$\mathbf{Cov} = 0 \ and \ \mathbf{Var} = 1$$ $$i.e$$ $$\mathbf{Cov}_{whitened} = \mathbb{I}$$ The process of whitening is done through eigenvalue decomposition of its covariance matrix. More insights here.

Whitened dataset is defined as follows $$\Large \mathbf{x}_w = (\mathbf{E} \mathbf{D}^{-\frac{1}{2}}E^T)\mathbf{x} \tag{Whitened Data}$$

Where,

  • $\mathbf{D}$ is a diagonal matrix of eigenvalues, every diagonal element is an eigenvalue of the covariance matrix
  • $\mathbf{E}$ is an orthogonal matrix of eigenvectors
def center(x):
    x = np.array(x)
    return x - x.mean(axis=1, keepdims=True)

def whiten(x):
    eigen_values, eigen_vectors = np.linalg.eigh(np.cov(x))
    D = np.diag(eigen_values)
    sqrt_inverse_D = np.sqrt(np.linalg.inv(D))
    x_whiten = eigen_vectors @ (sqrt_inverse_D @ (eigen_vectors.T @ x))
    
    print(f'Shape of Eigen Values: {eigen_values.shape}, Eigen Vectors: {eigen_vectors.shape}, Whitened Data: {x_whiten.shape}')
    
    return x_whiten, D, eigen_vectors

X_whiten, D, E = whiten(
    center(x)
)
D, E
Shape of Eigen Values: (2,), Eigen Vectors: (2, 2), Whitened Data: (2, 220568)





(array([[0.04109109, 0.        ],
        [0.        , 0.05854404]]),
 array([[-0.99991728, -0.01286218],
        [-0.01286218,  0.99991728]]))

Negative Entropy and Convergence

We deduced the problem quite cleverly and arrived at only unknown variable $\mathbf{V}$, refer here for the mathematical background. $$\Large \mathbf{W} = \mathbf{VD}^{-\frac{1}{2}}E^T \tag{Reconstructed Mixing Matrix}$$ Where,

  • $\mathbf{D} \ and \ \mathbf{E}$ are eigenvalues and eigenvectors of the covariance of the dataset $\mathbf{x}$
  • $\mathbf{V}$ is the only unknown rotation matrix

We seek information theory to solve for $\mathbf{W}$ through $\mathbf{V}$. refer here

$$I(\mathbf{\hat s}) = \sum_i H[(\mathbf{Vx_w})_i] - H[\mathbf{Vx_w}]$$

Now, we reach convergence by updating the value of $\mathbf{W}$ by attaining the dot product of it and it’s tranpose approximately equal to 1. i.e,

$$\mathbf{w_p^T w_{p+1}} \approx 1$$

$$\mathbf{w_p} = \frac{1}{n} \sum_{i=0}^n \mathbf{X obj(W^TX)} - \frac{1}{n} \sum_{i=0}^n \mathbf{dObj(W^TX)W} \tag{Towards Convergence}$$

Where,

  • $\mathbf{obj(.)} \ and \ \mathbf{dObj(.)}$ are objective function and its derivative respectively.

The methods used for approximating $\mathbf{W}$ are

  • Lagrangian Multiplier Method and
  • Newton Iteration
    from an objective function that aims to non-Gaussian maximization. The process involved including Lagrangian method and Newton iteration are beyond the scope of this post, I will work on a separate article on them in the near future.
def objFunc(x):
    return np.tanh(x)

def dObjFunc(x):
    return 1 - (objFunc(x) ** 2)

def calc_w_hat(W, X):
    # Implementation of the eqn. Towards Convergence
    w_hat = (X * objFunc(W.T @ X)).mean(axis=1) - dObjFunc(W.T @ X).mean() * W
    w_hat /= np.sqrt((w_hat ** 2).sum())
    
    return w_hat

Finally,

  • $\mathbf{W}$ is initialized to a random variable
  • $\mathbf{W^T W} \approx 1$ is the basis of orthogonality and indicates the convergence
  • Once resulting matrix is calculated, dot product of it and the whitened $\mathbf{x_w}$ signal gives the source
def ica(X, iterations, tolerance=1e-5):
    num_components = X.shape[0]
    
    W = np.zeros((num_components, num_components), dtype=X.dtype)
    distances = {i: [] for i in range(num_components)}
    
    for i in np.arange(num_components):
        w = np.random.rand(num_components)
        for j in np.arange(iterations):
            w_new = calc_w_hat(w, X)
            if(i >= 1):
                w_new -= np.dot(np.dot(w_new, W[:i].T), W[:i])
            distance = np.abs(np.abs((w * w_new).sum()) - 1)
            
            w = w_new
            if(distance < tolerance):
                print(f'Convergence attained for the {i+1}/{num_components} component.')
                print(f'Component: {i+1}/{num_components}, Step: {j}/{iterations}, Distance: {distance}\n')
            
                break;
                
            distances[i].append(distance)
            
            if(j % 50 == 0):
                print(f'Component: {i+1}/{num_components}, Step: {j}/{iterations}, Distance: {distance}')
            
            
                
        W[i, :] = w
    S = np.dot(W, X)
    
    return S, distances

S, distances = ica(X_whiten, iterations=100)
wf.write('s1_predicted.wav', sampling_rate, S[0].astype(np.float32))
wf.write('s2_predicted.wav', sampling_rate, S[1].astype(np.float32))
Component: 1/2, Step: 0/100, Distance: 0.23593716152077038
Convergence attained for the 1/2 component.
Component: 1/2, Step: 4/100, Distance: 5.152695734311763e-06

Component: 2/2, Step: 0/100, Distance: 0.8845851799308168
Component: 2/2, Step: 50/100, Distance: 3.80068562735314e-05

Inference

We reached the final part of this practical guide, Let us listen the separated the sources.

This post is a continuation of the previous post on Cocktail Party Problem - I believe we could demonstrate a working solution and the nuances of blind source separation using ICA. This is quite a progress for the following reasons,

  • The power of eigenvalues and eigenvectors is rendered effectively
  • We read/write a lot about entropy - the measure of uncertainty, in this post we could explore them at high level(I agree, there is some level of confusion still exists - more work to do)
  • We separated signals from different sources just like that - how fascinating it is.

I hope you all enjoyed this post - a working code on esoteric concepts is always more exciting than just the mathematical intuition and understanding.

References