Zero padding means extending a signal $x[n]$ with additional zeros to reach a desired signal length. This is often done before applying a FFT if an input signal does not have a power-of-two length, for example in an STFT with a specific window duration specified in $\mathrm{ms}$. Many FFT and STFT libraries and modules will automatically zero-pad.
The following plot shows a Hann window with a length of $N=700$ samples, that is zero padded to a length of $1024$ samples.
Effect and BenefitΒΆ
From the introduction to the Discrete Fourier Transform we know that the frequency resolution of the resulting frequency domain signal depends on the input length $N$ and the sampling rate $f_s$. The distance between neighboring bins is calculated as follows;
$$ \Delta_f = \frac{f_s}{N} $$
Increasing the length of $N$ of the signal reduces $\Delta_f$.
The plot below shows $x[n]$, a windowed sine wave with a fundamental frequency $f_0=107.5$, a length of $N=2048$ samples at a sampling rate of $f_s=8000$.
$x^*[n]$ is a version of the same signal, padded to a length of $N^*=4096$:
DFTΒΆ
The results of the DFT for the original and the padded signal are shown below.
-
$|X|$ shows the typical (smeared) peak of the sinusoid.
-
$|X^*|$ shows additional sidelobes - this is the effect of the windowing: the window gets narrower in the time domain through the padding.
More importantly, we see that the peak appears to be smoother for $|X^*|$. This is one of the reasons for applying zero-padding in many analysis applications. The increased density of frequency bins allows a more accurate estimation of peaks in the magnitute spectrogram.
For $|X|$, the highest peak is detected at $n=28$:
$$ f_0 = \frac{f_s}{N} = 28 \frac{8000 \mathrm{Hz}}{2048} = \mathbf{109.375 \mathrm{Hz}} $$
For $|X^*|$, the highest peak is detected at $n=55$:
$$ f_0 = \frac{f_s}{N^*} = 55 \frac{8000 \mathrm{Hz}}{4096} = \mathbf{107.42 \mathrm{Hz}} $$
The latter value is closer to the $107.5$ of the input signal. Note that a similar improvement of the peak estimation would be possible with polynominal interpolation on $|X|$.
InterpretationΒΆ
Zero padding does not add information and - although it does decrease $\Delta_f$ - does not increase the true frequency resolution. What happens, can be explained in the DFT equation:
$$ \begin{eqnarray} X[k] & = & \sum\limits_{n=0}^{N-1} x[n] e^{-j 2 \pi k \frac{n}{N}} \\ \end{eqnarray} $$
By zero-padding, we are increasing $N$. This adds additional complex oscillations to compare the signal with. We are thus adding bins in between the bins of the un-padded DFT. This can be written as:
$$ \begin{eqnarray} X[k] & = & \sum\limits_{n=0}^{N^*-1} x^*[n] e^{-j 2 \pi k \frac{n}{N^*}} \\ \end{eqnarray} $$
Fractional DFTΒΆ
The same result as with zero padding can be achieved by changing how the DFT treats the frequency index $k$. Normally it is assumed that:
$$ k \in \{0,1,2,β¦,N-1\} $$
We can also use fractional values $\hat{k}$, which is equivalent to upsampling. For the ratio $r = \frac{N}{N^*}$, this results in:
$$ \hat{k} \in \{0r,1r,2 r,β¦,(N^* r)-1\} = \{0,0.5,1,β¦,N-1\} $$
$$ \begin{eqnarray} X[\hat{k}] & = & \sum\limits_{n=0}^{N-1} x[n] e^{-j 2 \pi \hat{k} \frac{n}{N}} \\ \end{eqnarray} $$
The concept for alternative bin-spacings is used in other transforms, such as the constant-Q transform.
ComparisonΒΆ
The plots below show the results of the zero-padded DFT and the fractional DSP for comparison. Both algorithms result in the same magnitude spectrum. The benefits of FFTs might not apply for the fractional DFT case.