We ran a MUSHRA-inspired listening test, with a hidden reference, but no low-passed anchor, comparing various bitrates for EnCodec and the proposed method downsampled to 24kHz. This is an unfair comparison for the proposed model, since, while EnCodec runs natively at 24kHz, by downsampling the output of the proposed model from 44.1kHz to 24kHz we discard all the capacity and bitrate that was allocated to these higher frequencies.
Each of 9 expert listeners rated 12 randomly selected 10-second samples from our evaluation set, 4 of each domain; speech, music and environmental sounds.
Even in this unfair comparison, we find that the proposed codec achieves much higher MUSHRA scores than EnCodec at similar bitrates.
Five test samples from each of the three domains — speech, music, and environmental sounds are presented below.
<aside> <img src="/icons/exclamation-mark_gray.svg" alt="/icons/exclamation-mark_gray.svg" width="40px" /> Due to a high number of samples, this page may load slowly for some users.
</aside>
Original Audio
Codec @ bitrate (kbps)
Proposed @ 2.67
Proposed @ 5.33
Proposed @ 8
Encodec @ 3
Encodec @ 6
Encodec @ 12
Reconstructed Audio
Original Audio
Codec @ bitrate (kbps)
Proposed @ 2.67
Proposed @ 5.33
Proposed @ 8
Encodec @ 3
Encodec @ 6
Encodec @ 12