In this section, we demonstrate the phenomenon discussed in section 4.2 of the paper. Here we show three spectrogram plots of two samples, and include their respective audios as well.
Each sample has three spectrograms plots. The first plot is from the input audio, the second plot is from a baseline model trained without balanced data-sampling, and the third plot is from a baseline model trained with balanced data-sampling.
For both samples, we observe that the second plot has a cutoff frequency (~20K Hz), above which the model doesn’t reconstruct the original frequencies. Whereas, the third plot has no such cutoff frequency.
Input spectrogram.
Spectrogram of reconstructed audio from the baseline model trained without balanced data-sampling.
Spectrogram of reconstructed audio from the baseline model. This model was trained using balanced data-sampling.
Input spectrogram.
Spectrogram of reconstructed audio from the baseline model trained without balanced data-sampling.
Spectrogram of reconstructed audio from the baseline model. This model was trained using balanced data-sampling.