Blind Source Separation for Cocktail Party Problem
March 9, 2016
Blind Source Separation - BSS, as the name suggests, aims to extract original unknown source signals from the mixed ones. This is done by calculating an approximate mixing function using only the available observed mixed signals. “Blind” here means that the mixing function of signals which are recorded by microphones is unknown. BSS techniques do not require any prior knowledge about the mixing function or source signals and do not require any training data.
To understand BSS further, let us think of a scenario where a person is at a party. There are various disturbances like loud music, people screaming, and a lot of hustle-bustle. Yet, he/she is able to have a conversation with another person by filtering out the unnecessary signals and noise. In similar way, BSS provides a solution by separating and extracting the desired signals from a mixture of unknown signals, which is quiet similar in function to the human ear. This is what we term as “Cocktail Party Effect”.
It all started around early 80s during which researchers formulated BSS in the framework of neural modelling. It was later adopted in digital signal processing and communications. BSS was initially developed and experimented for linear mixtures of signals and later it evolved for non-linear and convolutive mixtures.
The underlying principle for BSS is based on linear noise-free systems theory according to which, a system with multiple inputs (speech sources) and multiple outputs (microphones or sensors) can be inverted under reasonable assumptions with appropriately chosen filters. (Say convolutive BSS filters in case of convolutive mixtures)
As shown in above figure, BSS mainly involves 2 steps.
- System Identification - where the filter coefficients of the mixing process are calculated, also called Mixing matrix factorization.
- Separation or Un-mixing - where the sources are separated by filtering using coefficients calculated in step one.
Another consideration for BSS is to have at least as many number of microphones, compared to the number of signal sources under study for separation. If the number of microphones used is less than the number of signal sources, then separation task becomes difficult, if not possible. The paper “Single Microphone Source Separation Using High Resolution Signal Reconstruction” by Trausti Kristjansson, Hagai Attias & John Hershey, offers one such solution.
Blind source separation can exploit linear, temporal, spatial or sparsity properties of signal sources. Based on such properties, we have different approaches for BSS -
- Principal Component Analysis
- Independent Component Analysis
- Spatio-Temporal Analysis
- Sparse Component Analysis
Many such algorithms have been developed to solve BSS problem by assuming some properties of signal sources or mixing process, suitable for particular applications. When such assumptions are made, we call it “Semi-Blind Source Separation”.
Applications of BSS
BSS can be used for enhancing noisy speech in real world environments and the applications are not just limited to speech/audio processing but also used for image, astronomical, satellite and biomedical signal analysis. Double-talk (DT) issue in Acoustic Echo Cancellation (AEC) can be addressed using BSS algorithm, without using double-talk detector or step size controller.
BSS is also used in microphone array processing (multiple microphones), making it useful in diverse applications.