Zero padding means extending a signal $x[n]$ with additional zeros to reach a desired signal length. This is often done before applying a FFT if an input signal does not have a power-of-two length, for example in an STFT with a specific window duration specified in $\mathrm{ms}$. Many FFT and STFT libraries and modules will automatically zero-pad.

The following plot shows a Hann window with a length of $N=700$ samples, that is zero padded to a length of $1024$ samples.

No description has been provided for this image

Effect and Benefit¶

From the introduction to the Discrete Fourier Transform we know that the frequency resolution of the resulting frequency domain signal depends on the input length $N$ and the sampling rate $f_s$. The distance between neighboring bins is calculated as follows;

$$ \Delta_f = \frac{f_s}{N} $$

Increasing the length of $N$ of the signal reduces $\Delta_f$.

The plot below shows $x[n]$, a windowed sine wave with a fundamental frequency $f_0=107.5$, a length of $N=2048$ samples at a sampling rate of $f_s=8000$.

$x^*[n]$ is a version of the same signal, padded to a length of $N^*=4096$:

DFT¶

The results of the DFT for the original and the padded signal are shown below.

$|X|$ shows the typical (smeared) peak of the sinusoid.
$|X^*|$ shows additional sidelobes - this is the effect of the windowing: the window gets narrower in the time domain through the padding.

More importantly, we see that the peak appears to be smoother for $|X^*|$. This is one of the reasons for applying zero-padding in many analysis applications. The increased density of frequency bins allows a more accurate estimation of peaks in the magnitute spectrogram.

For $|X|$, the highest peak is detected at $n=28$:

$$ f_0 = \frac{f_s}{N} = 28 \frac{8000 \mathrm{Hz}}{2048} = \mathbf{109.375 \mathrm{Hz}} $$

For $|X^*|$, the highest peak is detected at $n=55$:

$$ f_0 = \frac{f_s}{N^*} = 55 \frac{8000 \mathrm{Hz}}{4096} = \mathbf{107.42 \mathrm{Hz}} $$

The latter value is closer to the $107.5$ of the input signal. Note that a similar improvement of the peak estimation would be possible with polynominal interpolation on $|X|$.

Interpretation¶

Zero padding does not add information and - although it does decrease $\Delta_f$ - does not increase the true frequency resolution. What happens, can be explained in the DFT equation:

$$ \begin{eqnarray} X[k] & = & \sum\limits_{n=0}^{N-1} x[n] e^{-j 2 \pi k \frac{n}{N}} \\ \end{eqnarray} $$

By zero-padding, we are increasing $N$. This adds additional complex oscillations to compare the signal with. We are thus adding bins in between the bins of the un-padded DFT. This can be written as:

$$ \begin{eqnarray} X[k] & = & \sum\limits_{n=0}^{N^*-1} x^*[n] e^{-j 2 \pi k \frac{n}{N^*}} \\ \end{eqnarray} $$

Fractional DFT¶

The same result as with zero padding can be achieved by changing how the DFT treats the frequency index $k$. Normally it is assumed that:

$$ k \in \{0,1,2,…,N-1\} $$

We can also use fractional values $\hat{k}$, which is equivalent to upsampling. For the ratio $r = \frac{N}{N^*}$, this results in:

$$ \hat{k} \in \{0r,1r,2 r,…,(N^* r)-1\} = \{0,0.5,1,…,N-1\} $$

$$ \begin{eqnarray} X[\hat{k}] & = & \sum\limits_{n=0}^{N-1} x[n] e^{-j 2 \pi \hat{k} \frac{n}{N}} \\ \end{eqnarray} $$

The concept for alternative bin-spacings is used in other transforms, such as the constant-Q transform.

Comparison¶

The plots below show the results of the zero-padded DFT and the fractional DSP for comparison. Both algorithms result in the same magnitude spectrum. The benefits of FFTs might not apply for the fractional DFT case.