Bat Calls: Analysis & Sonograms
The recorded sound consists of a number of samples. You should bear in mind that the sound the bat made has been changed in many ways since it left the bat. The sound picked up by the detector included the original sound, reflections from the surroundings, and environmental noise. On its way through the detector and recorder many other distortions have been introduced; and finally our signal is sampled to provide the data values we need for analysis - so the sound instant by instant is now represented by a sequence of data values. There is therefore little benefit in striving for "perfect" analyses; all we can expect is to gain a little more understanding of the recording - and perhaps its best to use a multisensory approach, and play it back while you look at the sonogram?
Here you can see the original wave (blue line) which has now been sampled (red line). The more samples we take each second the better the representation of the wave will be. However too high a sampling rate will just give lots of data for no benefit. Most bat detectors provide an audio output which will be recorded by an audio recorder - so the highest sampling rate needed is the same as for digitising audio - twice the highest frequency in the signal, or about 40kHz. (Nyquist Shannon theorem)
Real-time recording needs much higher sampling rates - up to 240kHz.
Measuring the amount of each different frequency in a signal is called Fourier transformation. Inside the computer its done by taking a set of data values (often 256 but sometimes more or less) and manipulating them mathematically in a process known as a Discrete Fourier Transform (DFT)
One way of looking at this is that the DFT compares the original signal against sine waves that fit exactly into the window; and the result is a plot of how well the sample matches each of them. A special algorithm called the FFT (fast fourier transform) is normally used; this gives the same results, but is quicker to compute. Without going into details the FFT takes a set of data points that must be a power of 2 - i.e. 16, 32, 64, 128, 256, 512, 1024 ... and performs calculations on them to produce the FFT. The number of values chosen is called the "window length". A higher window length will give a more accurate frequency plot, but will take longer to compute and give a lower resolution in time.
"Window type" is another important parameter. Above you see a continuous (blue) wave sampled by a (red) rectangular window. The sudden changes at the edges of the window introduce distortion to the spectrogram. You can see this distortion in the shape of the green wave.
Different shaped windows have been introduced to reduce the effect of this distortion. This rectangular or "boxcar" window is the best for analysing transients (i.e. bat calls) that are shorter than the window.
The Hann window and Hamming window functions, amongst others, provide a mathematically simple way of reducing distortion that can be introduced by the windowing process. Here thecentral part of the window is given a much higher weighting than the edges.
Each window is a trade-off between several different factors. Here you will find an excellent paper on the DFT particularly with reference to these window functions (Click to get the pdf its better to read than the web version.)
A more detailed discussion of windowing functions is available here
For our purposes a Hann or Hamming window is probably most appropriate.
lets take a recording and analyse it
Here is the result: a spectrogram from a time expanded (X20) recording of a daubentons bat, and produced by wavesurfer.
Wavesurfer is free, but a bit tricksy to use until you learn to right click on the windows.
The call here is from a Daubenton's bat recorded from a time expansion detector at X20.
The upper pane shows the "envelope" of the call; and you might think that it is not a continuous whistle but rises and falls in amplitude. However this could just be a side-effect (an "artefact" ) arising because the call is very short (see below *).
This FFT uses 128 samples and a Hamming window. The bright colours (red orange yellow) indicate parts of the call where there is a particularly loud component at that frequency. We can see that this call is falling from 70kHz to about 35kHz. It lasts from 0.97 to 1.03 sec - (/20) i.e. about 0.0035 sec.
The yellow vertical line is a marker used to choose or identify parts of the wave. Here I'm using it to point out the additional frequencies caused by distortion when the wave is too big and "clipping" at top and bottom. The distortion introduces harmonics, and that's why we get the two pale blobs on the spectrogram.
Remember that the original signal was time expanded by a factor of 20 - so we need to divide the time scale by 20 and multiply the frequency scale by 20.
Play the sound file used for this sonogram >
How do these artefacts arise?
* A pure sine-wave is v = A sin(2 pi f t). This is a continuous function of time. A "sine-wave" that is turned on and then off also includes frequency components that are harmonics (multiples) of a fundamental that fits inside that "window".
Here the total duration of the call is 0.0035 sec. which has a fundamental at
f = 1/t =1/0.0035 = 300Hz, and hence harmonics at 600, 900 etc.; so the amplitude variation we are seeing at about 900Hz is likely due to this effect. A simple way to check without doing the measuring and maths is just to see how many full sine-waves are shown in the amplitude window. If its a small whole number (here three) it could be down to this, and trying a different windowing function might be appropriate.
Fourier Analysis and FFT - explains about leakage and why we must be careful in interpreting amplitudes in the transform output.