Music is a language that can be expressed in many forms like pitch, harmony, melody, tempo, rhythm. With emerging science and technology, we are able to create modern tools to enhance music quality. This primarily depends on the type, quality, and tuning of an instrument. There are a vast number of musical instruments available in the market. Each instrument has unique sound and tuning which creates melodious sound. But these sounds contain some unwanted noise and echo which needs to be removed to get the high-quality sound.
Music instruments should be perfectly tuned to get the mellifluous sound. Tuning musical instruments is normally done with a fixed reference pitch A= 440Hz. It is difficult to tune an instrument only by listening to music notes because human ear is less sensitive to very small frequency differences. The difference in the harmonics creates different sounds. The following image shows different tuning parts of a flute, which is one of the famous acoustic instruments in Indian classical music.
Fig. Tuning parts of a Flute
In this article lets learn what exactly is tuning of instruments and more specifically how can it be done using basic signal processing techniques. Frequency is one of the main attributes of sound generated by musical instruments. A balanced frequency range is required to get smooth and flawless sound quality. The focus of our program’s design lies in developing suitable algorithms to accurately detect fundamental frequency of any particular note of flute. Musical instrument’s tuning problem is now reduced to identifying fundamental frequency of various notes generated by it.
There are various methods available for detection of pitch frequency which is described below.
- Time-domain
- Autocorrelation Function (ACF)
- Square Difference Function (SDF)
- Average Magnitude Difference Function (AMDF)
- Frequency Domain
- Spectrum Peak methods via FFT
- Other Methods
- Constant-Q-Transform (CQT)
- Wavelets
Audio Signal Pre-processing Techniques
This article demonstrates frequency tuning using autocorrelation method. Pre-processing of a music signal is done to increase the efficiency of ACF algorithm. Fundamental frequency or Pitch of a musical note is detected using autocorrelation method, by searching for a global peak in autocorrelation function.
Fig. Audio signal processing flow
- Centre Clipping
For audio processing center clipping is one of the famous methods for flattening the spectrum. Center clipping is also called as median filtering. Median filtering doesn’t filter out discontinuities in the signal. Sudden jumps or time discontinuities are essential for detecting the change in fundamental frequencies. Linear filtering operations normally tend to blur out these distinctive audio snippets. Non-linear transformation of the signal is used to remove median of the signal over a running window. Median smoothening removes general very slowly moving noise like sounds but perfectly retains sudden sharp changes.
- Framing
We are dealing with quasi-stationary signals. We have notes played over finite fixed time windows. As the fundamental frequency or pitch of a musical note is not typically steady over long-time durations, localized processing is preferred. Signal under analysis is normally broken down into smaller parts of 5ms to 20ms, during these time windows fundamental frequency is reasonably constant for most of the parts. By employing windowed signal processing, we are dealing with a trade-off of getting sufficient samples for accurate auto-correlation while small enough to assume stationary nature of the signal.
- Windowing
Employing windowing on fixed duration frames brings about a lens-like effect on the signal frame. Center portion of signal gets emphasis in analysis while either side boundaries are smoothly attenuated. It reigns in frequency distortions due to abrupt endings often referred to as Gibbs phenomenon. This makes the transition smoother as the window slides. The widely used windowing function in audio processing is symmetric function called “Hamming Window”.
- Silence Removal
The silence part of the audio is not really zeroed, instead its usually background noise with considerably lower energy. This silence portion no computational value to the algorithm and can be safely removed. Removing silence reduces processing requirements and also reduces unwanted noise in auto-correlation computation.
- Frequency Computation
Autocorrelation Function (ACF) is a subset of cross correlation function, where signal cross-correlates with itself by multiplying each input element by shifted version of the same signal. Discrete-time equation of ACF for windowed frame,
Consider a periodic signal is used to calculate its ACF values for varying lags. As we notice here, ACF peaks at a lag value of around 76 in the below plot. Lag 76 represents the certain period in time and hence its pitch after considering the sampling frequency of the signal.
Fig. Autocorrelation result of windowed recording
- Frequency to Pitch Mapping
Obtained frequencies are mapped to actual pitch class by finding minimum absolute frequency difference. Here, we can also get other information of a musical note like whether played note is flat, sharp, or in tune.
- Smoothing
Smoothing operation acts as a low pass filter, which removes high-frequency noise components. This improves signal to noise ratio (SNR).
Applications Where Frequency Tuning is Needed
Frequency tuning techniques are applied in the process of recording, storing and, transmitting the audio content. Here are some of the applications which require these techniques.
- MIDI Converter/ Synthesizer
The MIDI (Musical instrument digital interface) protocol is a technical standard is used to send and receive information describing sound. The MIDI converter listens to audio data and outputs MIDI message of corresponding note by using Fast Fourier Transform and Peaks analysis. MIDI synthesizer converts MIDI messages to sound, which can mimic the musical instruments.
- Humming Composer/ Melody Extraction
To hum a song is often referred to making a wordless tone nasal sound with the mouth closed, closely following the melody of the song. Humming Composer is used to transforming humming to digital song that is played by different types of musical instruments. By humming a melody into the microphone, composer captures the audio and translates it into MIDI sequence.
- Query by Humming/ Music Recognition Systems
MRS systems allow users to identify a song by just using its humming part of the tune without the use of instruments or any previous knowledge of that music. By utilizing this, user can query a large audio database by humming, which uses relative pitch changes as the melodic information.
If we capture a few seconds of a song without considering that it is intro, verse, or chorus with the audio processing, it creates a fingerprint for the recorded excerpt. This fingerprint vector can be used as a hash code to compare against the similar fingerprints extracted from other music songs from the database, with a little bit help of music recognition algorithm, we can now exactly identify which song you are listening to. To identify the song, we need to record the song on say a mobile then extract its fingerprint which can be used to match it with the database of fingerprints as hashtags.
- Music Transcription
In music signal processing, transcription is an emerging technology which is used to annotate an existing piece of music with notations. Pitch detection is one of the major elements in music transcription process.
- Musical Information Retrieval
A large set of pitch related information can be retrieved from music pieces which can be used for statistical analysis of music. These obtained features can be further used in applications such as music genre classification, emotion analysis, etc.
- Singing Learning
By comparing the learner’s rendition with the original singer, the learner can get the pitch-time related information such as “sur” and “taal” which can further be utilized to improving one’s own pitch.
Conclusion
There are so many factors involved in the tuning of musical instruments, of which pitch frequency is a dominant aspect. Pitch frequency depends on factors such as sampling frequency, sampling duration, nature of the sound signal, background noise, etc. There are different pitch detection algorithms, considering their hardware and software requirements, the optimality of each of these algorithmic methods is very application-specific. Many applications as listed above utilize fundamental frequency as a major characterizing element in audio signal processing.
With over 14+ years of experience in the field of audio signal processing, PathPartner’s audio sources tuning solutions and pitch detection models will provide you the best design for your product development. If you’d like to know more about PathPartner, reach out to us for a quick consultation at marcom@pathpartnertech.com