We ran a MUSHRA-inspired listening test, with a hidden reference, but no low-passed anchor, comparing various bitrates for EnCodec and the proposed method downsampled to 24kHz. This is an unfair comparison for the proposed model, since, while EnCodec runs natively at 24kHz, by downsampling the output of the proposed model from 44.1kHz to 24kHz we discard all the capacity and bitrate that was allocated to these higher frequencies.

Each of 9 expert listeners rated 12 randomly selected 10-second samples from our evaluation set, 4 of each domain; speech, music and environmental sounds.

Untitled

Even in this unfair comparison, we find that the proposed codec achieves much higher MUSHRA scores than EnCodec at similar bitrates.

Five test samples from each of the three domains — speech, music, and environmental sounds are presented below.

<aside> <img src="/icons/exclamation-mark_gray.svg" alt="/icons/exclamation-mark_gray.svg" width="40px" /> Due to a high number of samples, this page may load slowly for some users.

</aside>

Speech


Original Audio

sample_9.wav

sample_21.wav

sample_30.wav

sample_102.wav

sample_165.wav

Codec @ bitrate (kbps)

Proposed @ 2.67

Proposed @ 5.33

Proposed @ 8


Encodec @ 3

Encodec @ 6

Encodec @ 12

Reconstructed Audio

sample_9.wav

sample_9.wav

sample_9.wav


sample_9.wav

sample_9.wav

sample_9.wav

sample_21.wav

sample_21.wav

sample_21.wav


sample_21.wav

sample_21.wav

sample_21.wav

sample_30.wav

sample_30.wav

sample_30.wav


sample_30.wav

sample_30.wav

sample_30.wav

sample_102.wav

sample_102.wav

sample_102.wav


sample_102.wav

sample_102.wav

sample_102.wav

sample_165.wav

sample_165.wav

sample_165.wav


sample_165.wav

sample_165.wav

sample_165.wav

Music


Original Audio

sample_10.wav

sample_22.wav

sample_31.wav

sample_103.wav

sample_166.wav

Codec @ bitrate (kbps)

Proposed @ 2.67

Proposed @ 5.33

Proposed @ 8


Encodec @ 3

Encodec @ 6

Encodec @ 12

Environmental Sounds