This is the sample page for our ICASSP 2024 submission 'Phase reconstruction in single channel speech enhancement based on phase gradients and estimated clean-speech amplitudes'.

Audio samples

We will demonstrate the benefit of the proposed method in the following two aspects:

All the noisy samples are normalised to -26 dBov individually based on the active speech level (ASL).
We recommend using neutral sound headphone to better discern differences among the presented methods.

Using synthesised phase [1] vs the proposed reconstructed phase

Here we see the effect of using the synthetic phase retrived by the algorithm proposed in [1].
It may be observed that using the purely synthesised phase leads to an unnatural output, especially at high SNR, where the underlying signal phase is not heavily distorted by the noise. In comparison, our proposed method provides a more natural sounding output and the effect of the phase enhancement is perceivable especially in the voiced segments as having less "vocoding-like" artefacts. Since our method combines the predicted phase with that of the mixture signal, it helps preserve the naturalness when the underlying signal is less corrupted by noise.
Methods Noisy input Real CRUSE Real CRUSE-synthetic Real CRUSE-agnostic Clean ref

Improvement by the proposed phase reconstruction

Now, we demonstrate the benefit of using the estimated phase obtained from the proposed method.
The samples below are processed by: Note that the effect of the phase reconstruction is best perceivable as reduction of "vocoding-like" artefacts, which occur when noise is present between harmonics. This is also visible in the spectra.
Obviously, if the initial phase estimate is good, less difference is be observed bewtween the speech estimate with the noisy phase and the one with the reconstructed phase.

For the ease of observation, we zoom all the spectrogram into [0, 4] kHz.
By moving your mouse over the spectrogram of the proposed method, you can see the difference to using the noisy phase.
Click the spectrogram to enlarge/reset it.

Methods Noisy input Real CRUSE Real CRUSE - agnostic Real CRUSE - clean phase Complex CRUSE Complex CRUSE - agnostic Complex CRUSE - clean phase Clean ref

References

  1. Y. Masuyama, K. Yatabe, K. Nagatomo, and Y. Oikawa, "Online phase reconstruction via DNN-based phase differences estimation", IEEE/ACM Transactions on Audio, Speech, and Language Processing, (2022), 31, 163-176.
  2. S. Braun, H. Gamper, C. K. Reddy, and I. Tashev, “Towards efficient models for real-time deep noise suppression,” ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 656–660.