I’m making some personal work processing audio files and while doing benchmark with Audacity I was surprised with a very different result, besides the Y-axis are different, the shape of the curve is very different regarding the frequency, maybe the graphics are not about the same thing, could you guys give me some help?
Note: I’m evaluating a music file *.wav
Spectrum generated from Octave using the Fast Fourier Transform (FFT)
Spectrum generated from Audacity with Size = 4096
Spectrum generated from Audacity with Size = 65535
- OS: Linux Manjaro KDE Stable
- Octave version: e.g. Version 7.1.0
- Installation method: e.g. Flatpak (flathub) version from pamac list
pkg load signal
[x, fs] = audioread('test.wav');
N=length(x); % Sample quantity
% Plot time domain
t = linspace(0,N/fs,N); % time vector [s]
figure (1), plot(t,x),
NFFT = 2^nextpow2(N); % Find the exponencial base 2 closest as possible of the sample size due to digital process
f = linspace(-fs/2,fs/2,NFFT); % Frequency vector
xf0 = fft(x,NFFT); % Calculate the Fast FourierTransform
xf1 = fftshift(xf0); % Perform a shift of the vector X in order to move the frequency 0 to the center
xf2 = abs(xf1); % Filter values to get only real numbers amd exclude imaginary ones
xf1=xf2/max(xf2); % Normalize
figure (21), plot(f,xf1)
axis([0 20e3 0 1])
title('Signal Frequency Spectrum - Normalized')
I cannot say definitively, but I notice that your Audacity y-axis is in dB therefore logarithmic and your Octave y-axis does not say if it’s dB or not. See if using
semilogy makes a difference for you. You can also take the log manually before calling plot.
dbs are very different than magnitude. So you should change one of them so that you have the same units on the Y axis.
My graphic was normalized so the max amplitude can achieve 1.
See below the graphic using semilogy from a normalized value, still different from audacity, amplitude and shape. It must be log from normalized data otherwise it will exceed the top value of 0dBFS.
dB = 20 *
using semilog function or the equation xf = 20*log10(xf1) combined with plot the results is the same.
but if you compare with Audacity, the max value is -26dB @ 1hz and -90dB @ 15kHz while the Octave is showing ~0dB @ 1Hz and at -90dB amplitude there are still frequencies above 15kHz.
Probably they are not the same graphic, don’t know why, but one thing that bothers me is the NFFT, the audio has 198s of time with sample frequency of 44100 Hz, so I used the equation below to proper calculate the NFFT. It’s needed in the FFT input.
NFFT = 2^nextpow2(N);
the equation results me the value: 1.6777e+07
Audacity has a drop box menu to allow me to choose the the “size”. Even with 4096 size (4*2^10) it plot a good graphic (original post present two graphic with two different sizes), and if you increase the value it improves the accuracy for high frequencies but forcing the NFFT inside Octave to be equal to Audacity size the results is very messy.
Another thing to check is the windowing function. It appears that Audacity is applying a Hann window before the FFT. The Octave code you posted has no windowing which therefore equates to a rectangular window. The
hanning function might be useful here.
Out of curiosity what do you obtain if you use a
pwelch periodogram to compute the powerspectral density and then take the square root of the output spectrum? Also audacity seems to use bar plot :
fs = 44100
nfft = 2^16;
[x, fs] = audioread('test.wav');
[spectra, freq] = pwelch (x, hann (nfft), 0.5 , nfft, fs);
bar (freq, 20 * log10 (sqrt (spectra)))
thanks for the suggestions, but it does not approximate the results, it slightly bend the higher frequencies down. You can also see below that the results doesn’t fit in terms of Y-axis with audacity.
(note: i set up y-axis with the same scale as Audacity to make it easier to compare)
without hanning before fft
with hanning before fft
As a curiosity purpose I shift down the curve 32dB trying to approximate from Audacity, but it’s still different.
with hanning before fft and 32dB offset
thanks everyone for all support.
pwelch for any reason can’t use stereo signal (two-columns vector) so to make the exercise possible I used a single channel (left or right).
The bar functions is crashing my octave, probably the amount of data is so huge that it’s impossible to plot bars, in this case I change back to plot function. (due to the spikes Audacity is not using bars, it looks like plot).
another think to mention is that you indicate hanning function calling NFFT, while in the documentation they suggest to call in the call hanning function the length of sample vector.
my sample vector (N) = 8712576
NFFT = 1.6777e+07
NFFT and sample vectors correlates acording to the equation below:
so I will plot both results and also change the NFFT according to 2 figures from Audacity.
hanning(NFFT) : NFFT = 1.6777e+07
hanning(NFFT) : NFFT = 4096
hanning(NFFT) : NFFT = 65535
Maybe it would be useful to go the other way and start with a data file in Octave and see what Audacity produces for the spectrum. I would use
sinetone to generate two tones with an amplitude ratio of 2:1 and then use
audiowrite to output a file for Audacity. Compare the FFT of that known file between Octave and Audacity to try and debug what might be happening.
ok, I’m little busy for a moment, but as long I can implement your suggestion I will back with feedbacks.