Supplementary Materials

for "Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction"

Chengzhong Wang, Andong Li, Dingding Yao, and Junfeng Li*

Abstract

This page demonstrates the performance of our proposed method compared to Semamba w/o PCS, ZipEnhancer, MPSENet-Up, and Universe++ across universal speech degradation scenarios.

Training Logs

The following figures visualize the training convergence and PESQ metric improvements over epochs for the universal speech enhancement task trained on synthesized DNS2021 data (300 hours), validated on a subset of WSJ+WHAMR! (150 pieces), all with the same loss and optimizer configurations.

Training Loss Curve
Figure 1: Training Loss Convergence (Pink: Proposed, Blue: ZipEnhancer, Green: MPSENet-Up, Light Blue: Semamba w/o PCS).
Validation Loss (PESQ)
Figure 2: Validation Loss (PESQ) (Pink: Proposed, Blue: ZipEnhancer, Green: MPSENet-Up, Light Blue: Semamba w/o PCS).

DNS-2020 Large-Scale Denoising (3000h)

The following results are from the ultra-large-scale DNS-2020 purely denoising task, all with the same loss and optimizer configurations (the same as that of MP-SENet). The grey curve denotes the proposed method, the blue curve denotes SEMamba w/o PCS, and the red curve denotes MP-SENet-Up.

DNS-2020 Validation PESQ
Figure 3: DNS-2020 validation PESQ on the large-scale 3000h denoising task.
DNS-2020 Validation Phase Metric
Figure 4: DNS-2020 validation phase metric on the large-scale 3000h denoising task.

Phase Retrieval on VoiceBank (Validation)

The orange curve denotes the proposed method, the blue curve denotes MP-SENet Up, and the red curve denotes SEMamba. The validation phase loss is computed as GD + IF + PD. Since our method surpasses the baselines after about 250k steps within fewer than 150k training steps, we did not continue training the proposed method further. Training deeper may yield better results. All models are trained with the same loss and optimizer configurations.

Phase Retrieval Validation PESQ
Figure 3: Phase Retrieval Validation (PESQ) on VoiceBank.
Phase Retrieval Validation Phase Loss (GD+IF+PD)
Figure 4: Phase Retrieval Validation Phase Loss (GD + IF + PD) on VoiceBank.

Acknowledgment of Baseline Methods

We gratefully acknowledge the authors of the compared baseline methods for sharing their open-sourced code. The following papers present the methods used in this comparison:

Statistical Significance (p-test for Universal SE scenario: Proposed vs. ZipEnhancer)

We report two-sided p-test results comparing the proposed method with the second-ranked ZipEnhancer. “Significant (+)” indicates improvement in value; “Significant (-)” indicates degradation in value.

Composite (DN+DR, DN+DR+BWE)
Metric P-value Result
PESQ7.05602e-04Significant (+)
STOI2.45487e-01
SI-SDR2.07514e-05Significant (+)
COVL2.61178e-03Significant (+)
UTMOS7.55323e-04Significant (+)
PD(↓)6.64324e-03Significant (-)
WOPD(↓)2.79556e-05Significant (-)
Overall (DN, DR, BWE, DN+DR, DN+DR+BWE)
Metric P-value Result
PESQ2.05312e-11Significant (+)
STOI5.54429e-02
SI-SDR8.56033e-07Significant (+)
COVL4.49094e-12Significant (+)
UTMOS1.52879e-10Significant (+)
PD1.17751e-05Significant (-)
WOPD4.83145e-12Significant (-)

Audio Samples

Please use headphones for the best listening experience.

Supplementary Notes

For readers' convenience, we provide a brief note on the model's Global Rotation Equivariance (GRE) on a dedicated page.

Open Supplementary Notes