Introduction

This is the demo page of our paper: On the robustness of non-intrusive speech quality model by adversarial examples, and the codes will be shared here.

In the paper we found adversarial examples for DNSMOS, a CNN-based speech quality predictor. In this demo, we share the perturbed and original wav files from three datasets: DNS-challenge, TIMIT, and VCTK-Demand.

DNS-challenge

Sample 1

[original: SIG=4.17, BAK=4.49, OVR=3.98], [perturbed: SIG=1.14, BAK=1.22, OVR=0.97]

Sample 2

[original: SIG=4.06, BAK=4.16, OVR=3.73], [perturbed: SIG=0.99, BAK=1.00, OVR=1.00]

Sample 3

[original: SIG=2.37, BAK=2.50, OVR=1.83], [perturbed: SIG=4.89, BAK=4.97, OVR=4.69]

TIMIT

Sample 1

[original: SIG=1.06, BAK=1.05, OVR=1.00], [perturbed: SIG=5.00, BAK=4.24, OVR=4.35]

Sample 2

[original: SIG=3.85, BAK=1.82, OVR=2.22], [perturbed: SIG=4.06, BAK=5.00, OVR=4.17]

Sample 3

[original: SIG=4.05, BAK=3.07, OVR=3.21], [perturbed: SIG=1.00, BAK=1.00, OVR=1.00]

VCTK-Demand

Sample 1

[original: SIG=1.03, BAK=1.05, OVR=0.98], [perturbed: SIG=4.01, BAK=3.27, OVR=3.23]

Sample 2

[original: SIG=1.38, BAK=1.09, OVR=1.09], [perturbed: SIG=4.28, BAK=4.00, OVR=3.70]

Sample 3

[original: SIG=2.33, BAK=1.40, OVR=1.54], [perturbed: SIG=4.00, BAK=4.00, OVR=3.72]