I’m making some personal work processing audio files and while doing benchmark with Audacity I was surprised with a very different result, besides the Y-axis are different, the shape of the curve is very different regarding the frequency, maybe the graphics are not about the same thing, could you guys give me some help?

Note: I’m evaluating a music file *.wav

Spectrum generated from Octave using the Fast Fourier Transform (FFT)

Installation method: e.g. Flatpak (flathub) version from pamac list

close all;
clear all;
clc;
pkg load signal
audioinfo('test.wav')
[x, fs] = audioread('test.wav');
N=length(x); % Sample quantity
% Plot time domain
t = linspace(0,N/fs,N); % time vector [s]
figure (1), plot(t,x),
title('Time Domain'),
xlabel('Time(s)'),
ylabel('Amplitude'),
NFFT = 2^nextpow2(N); % Find the exponencial base 2 closest as possible of the sample size due to digital process
f = linspace(-fs/2,fs/2,NFFT); % Frequency vector
xf0 = fft(x,NFFT); % Calculate the Fast FourierTransform
xf1 = fftshift(xf0); % Perform a shift of the vector X in order to move the frequency 0 to the center
xf2 = abs(xf1); % Filter values to get only real numbers amd exclude imaginary ones
xf1=xf2/max(xf2); % Normalize
figure (21), plot(f,xf1)
axis([0 20e3 0 1])
xticks([Wn])
title('Signal Frequency Spectrum - Normalized')
xlabel('Frequency[Hz]')
ylabel('Amplitude')
grid on

I cannot say definitively, but I notice that your Audacity y-axis is in dB therefore logarithmic and your Octave y-axis does not say if it’s dB or not. See if using semilogy makes a difference for you. You can also take the log manually before calling plot.

My graphic was normalized so the max amplitude can achieve 1.

See below the graphic using semilogy from a normalized value, still different from audacity, amplitude and shape. It must be log from normalized data otherwise it will exceed the top value of 0dBFS.

but if you compare with Audacity, the max value is -26dB @ 1hz and -90dB @ 15kHz while the Octave is showing ~0dB @ 1Hz and at -90dB amplitude there are still frequencies above 15kHz.

Probably they are not the same graphic, don’t know why, but one thing that bothers me is the NFFT, the audio has 198s of time with sample frequency of 44100 Hz, so I used the equation below to proper calculate the NFFT. It’s needed in the FFT input.

NFFT = 2^nextpow2(N);

the equation results me the value: 1.6777e+07

Audacity has a drop box menu to allow me to choose the the “size”. Even with 4096 size (4*2^10) it plot a good graphic (original post present two graphic with two different sizes), and if you increase the value it improves the accuracy for high frequencies but forcing the NFFT inside Octave to be equal to Audacity size the results is very messy.

Another thing to check is the windowing function. It appears that Audacity is applying a Hann window before the FFT. The Octave code you posted has no windowing which therefore equates to a rectangular window. The hanning function might be useful here.

Out of curiosity what do you obtain if you use a pwelch periodogram to compute the powerspectral density and then take the square root of the output spectrum? Also audacity seems to use bar plot :

thanks for the suggestions, but it does not approximate the results, it slightly bend the higher frequencies down. You can also see below that the results doesn’t fit in terms of Y-axis with audacity.

(note: i set up y-axis with the same scale as Audacity to make it easier to compare)

pwelch for any reason can’t use stereo signal (two-columns vector) so to make the exercise possible I used a single channel (left or right).

The bar functions is crashing my octave, probably the amount of data is so huge that it’s impossible to plot bars, in this case I change back to plot function. (due to the spikes Audacity is not using bars, it looks like plot).

another think to mention is that you indicate hanning function calling NFFT, while in the documentation they suggest to call in the call hanning function the length of sample vector.

to know:

my sample vector (N) = 8712576
NFFT = 1.6777e+07

NFFT and sample vectors correlates acording to the equation below:

NFFT=2^nextpow2(N);

so I will plot both results and also change the NFFT according to 2 figures from Audacity.

Maybe it would be useful to go the other way and start with a data file in Octave and see what Audacity produces for the spectrum. I would use sinetone to generate two tones with an amplitude ratio of 2:1 and then use audiowrite to output a file for Audacity. Compare the FFT of that known file between Octave and Audacity to try and debug what might be happening.