I don't think Discord makes any claims that the audio is P2P encrypted. There are legitimate reasons why Discord might be dropping malformed packets, apart from an indication that they are spying on you (they may be doing that too).
1) to improve audio quality.
2) to help prevent RCE attacks on the destination client.
3) re-encoding at lower bitrates for low bandwidth clients.
I don't really see the issue here unless Discord claimed they do not decrypt the audio.
3) is most certainly at play here, as Discord allows clients to set their preferred bitrate (RX&TX), which would not be possible in multi-party calls without re-encoding.
Could they not just drop the quality of the whole call down to the lowest bandwidth allowed by a user? I feel that would reduce a computational burden on Discord's end, while allowing the lowest client-to-client latency
Keep in mind that a major use case for Discord is open voice chats (e.g, for gaming groups), not just organized person-to-person calls. Having the quality for a whole chat drop just because someone joined from a mobile phone would be a really disappointing user experience.
This is absolutely something that Discord does, though. I've had friends just drop the bitrate slider as low as possible in Discord just to make the whole channel sound awful
That's a channel-specific option though. You can set per-channel bitrate, but it's not something Discord does automatically to accommodate lower throughput clients.
They could, but if you have 4 people in a call and 3 can receive high bandwidth audio, lowering it just for the 1 person with low bandwidth is the best user experience.
Otherwise people with good networks who have their call quality dragged down will just think Discords voice chat is bad.
Wouldn’t it be possible to send a stream cipher encrypted packet of audio to a hub server where the codec has a “progressive” decoding mode? If I’m not mistaken, Opus can already do something like this. That way a client could set their desired bitrate and the server would truncate the packets before passing them off and doesn’t need to be re-encoded. This only works with a stream cipher though.
Maybe an audio engineer or cryptographer could chime in?
It looks like Ogg Vorbis has theoretical support in the spec for something called "Bitrate Peeling"[1], but there is no functional implementation for this yet, and there's been an open bounty on it since 2004.[2]
This is a really neat idea though. Truncating the packet to change the bitrate per client without re-encoding.
Peeling has been a goal for many audio/video codecs in the past. Nobody's been able to make it work acceptably, though -- either the low-bitrate version sounds awful, or the high-bitrate version increases in size to the point that it might as well just have a low-bitrate version alongside it.
3 is tens of megahertz of one CPU core per re-encode, clearly out of the question with todays norm being ~ 4GHz 8 core CPUs, that or they are spyi^^^ recording everything for 'metrics/analytics'.
CPU time is irrelevant for re-encoding due to bandwidth reasons though. If a client requests 64kbps voice, sending it packets at 128kbps and letting it re-encode once it has them is pointless.
1) to improve audio quality.
2) to help prevent RCE attacks on the destination client.
3) re-encoding at lower bitrates for low bandwidth clients.
I don't really see the issue here unless Discord claimed they do not decrypt the audio.