Acoustic Echo Canceller (AEC) plays an important role in voice communication systems. Research on Echo cancellers started in the 1960s, especially for voice communication devices. Echo cancellers aid for smooth and intelligible conversations in full-duplex voice communication. AEC is not only present in cellphones but also in all kinds of audio calling devices like Amazon’s Echo series and Facebook’s Portal devices.
Let's first understand what echo is in terms of voice communication. Suppose you are in a voice call talking with someone over a speakerphone. The speech of the other person you are talking to, referred to as far-end speech, would be played out from the loudspeaker. And your own voice, referred to as near-end speech, would be captured by the microphone and sent across the network to the other end of the call.
However, the microphones also capture the far-end speech due to the acoustic coupling of the loudspeaker and microphone. If the far-end speech gets transmitted back to the other side of the call, then the other person would hear their voice after some delay (network + processing delay). Echo is the phenomenon where one hears his/her own voice after a certain delay. The amount of delay defines how annoying the experience could be. Conversations in such scenarios would be very annoying and non-intelligible. Hence, we need to have a mechanism to block the transmission of far-end back to the other party in the call, which is accomplished by using an Echo Canceller in a communication device. Echo is not always an issue in the case of speakerphones but also on mobiles and other handsets because of the mechanical coupling of earpiece speakers and microphones.
Having understood echo, the easiest way to avoid it is by using half-duplex communication. In half-duplex communication, transmission is unidirectional at any moment during a conversation. While one party speaks, the other party listens and vice versa. A typical example of such communication is the Walkie-Talkie. But for an uninterrupted and natural conversation, full-duplex communication is required and an Echo Canceller is a must in this scenario.
Acoustic Echo Cancellation vs. Line Echo Cancellation
There are 2 types of Echo Cancellers. Line Echo Cancellers (LEC) and Acoustic Echo Cancellers (AEC). LECs are used by telephone networks to cancel echo resulting from the coupling between networks. And AECs are used to cancel out the echo resulting from the acoustic coupling between microphones and loudspeakers. AECs can be further classified into linear and non-linear. Let's dig deep into how Echo Cancellation works.
The basic principle of any Linear AEC is to predict the far-end signal (x(n)) by using an adaptive filter and then subtract the predicted far-end signal (ŷ(n)) from the microphone signal (d(n)) to generate the error signal (e(n)) free of the far-end signal. The error difference is fed to the adaptive filter so that the filter can adapt itself and produce predicted echo, using the far-end signal, which closely matches the echo captured by microphones.
A good AEC would have a good adaptive filter which produces a predicted far-end that matches very close to the echo captured in microphone signals so that the error difference is less. Echo in the microphone captured signal is the result of a direct path (which is influenced by the distance between the microphone and loudspeaker and their look-direction) and multiple reflections from surrounding walls or objects.
The linear adaptive filter mainly works to adapt the predicted signal to match with the direct path echo and all the other echo paths, in most cases, along with non-linearities added by the device (vibrations, etc.), they are removed by Non-Linear Processing (NLP). Since there would be no linear relation between the actual far-end and the captured echo because of the non-linearities of the device, LAEC would not be able to predict the far-end for non-linear echo suppression. Thus, NLP would be required after the LAEC stage in a speech processing chain to effectively suppress the echo.
The Echo Affect
Now would be a good time to understand how different delays of echo affect the human ear. Delays less than 40ms (for speech) are usually safe and would not annoy the listener. This is the Precedence effect. But delays greater than 40ms would be perceived as an echo by human ears. An echo during telephonic conversations is a result of network and processing delays. Networks can introduce in excess of 100msec of delay and this combined with processing delays would make it necessary to have an echo canceller for smoother, full-duplex conversations.
Since the effectiveness of the linear echo suppression is based on how good the adaptive filter predicts the echo signal, all the LAEC techniques focus on improving the predicted echo and at a faster rate so that there is no echo leak at the start of a call. This rate of adaptation is called a Convergence rate and is measured as time taken to suppress echo by a certain level (like 50dB) of its actual level captured in the microphone. Other terms used to quantify LAECs are Mean Square Error (MSE) and Echo Return Loss Enhancement (ERLE). MSE defines the level of residual echo over a period and ERLE defines the ratio of powers of captured echo and residual error.
Adaptive Filtering and its Applications
Figure: Adaptive AEC scheme
LAEC algorithms can be classified into time-domain and frequency-domain. As the names suggest, time-domain LAECs operate on time-domain speech samples whilst frequency-domain LAECs operate on FFT bins. Time-domain implementations would involve costly convolution as opposed to simple multiplication in frequency-domain for adaptive filtering. Frequency-domain implementations are a better choice if the entire processing chain is in frequency-domain. But, it’s good to experiment with both the implementations for specific use-cases and devices to see the difference in echo suppression.
Least Mean Squares (LMS) is the most basic example of an adaptive filter used for LAEC which operates in time-domain. It includes a FIR filter whose coefficients get adapted based on the error difference between the predicted echo signal and the microphone signal. The largest delay of the direct path echo which could be suppressed by the AEC is determined by the no. of coefficients/taps. Thus, a 1024 tap NLMS for signals at 16KHz sampling rate would be able to suppress a direct path echo delayed by <= 64ms. Thus, the taps for the adaptive filter define the max. delay of the direct path echo which it can suppress. One can’t go all out and have many taps to accommodate for huge direct path delays as that would affect the convergence time and also cause filter divergence.
Let’s understand how LMS works in detail. This is how the filter coefficients adapt
In above equation, w(n) is the adaptive filter coefficient array, x(n) is the far-end signal, e(n) is the error signal and μ is the small positive constant called step size. μ decides the speed of the filter to get adapted. But we must select this value very carefully as making it too small would increase convergence time and making it too big could result in filter divergence.
All the adaptive filter algorithms for LAEC work with this principle. Most of the LAEC implementations use sub-bands for adaptive filter. In other words, we can imagine the entire NLMS filter split into multiple filter banks of smaller lengths and each adapting for a range of frequencies. So, sub-band filters can adapt faster to frequencies of interest rather than adapting to entire range of frequencies as seen in full-band LMS.
Speaker phones, conferencing devices & wireless audio are gaining popularity due to their innovative features & immersive sound quality. Acoustic echo cancellation is well-known concept behind getting the smooth and flawless voice quality. Acoustic echo cancellation removes the echo, reverberation and unwanted noise caused by acoustic coupling between the microphone and loudspeaker. This blog discusses about how AEC works, echo, noise & delay possibilities.
With over 12+ years of experience in multimedia solutions, PathPartner’s custom echo cancellation solutions will provide you the best design for your product development. If you’d like to know more about PathPartner, write to us at firstname.lastname@example.org Or if you’d prefer, why not arrange a call with us?