> At approximately 4% word error rate on FLEURS and $0.003/min Amazons transcrip...

mdrzn · 2026-02-04T16:09:38 1770221378

Is it 0.003 per minute of audio uploaded, or "compute minute"?

For example fal.ai has a Whisper API endpoint priced at "$0.00125 per compute second" which (at 10-25x realtime) is EXTREMELY cheaper than all the competitors.

Oras · 2026-02-04T16:30:51 1770222651

I think the point is having it for real-time; this is for conversations rather than transcribing audio files.

jamilton · 2026-02-04T17:52:51 1770227571

That quote was for the non-realtime model.

85392_school · 2026-02-05T22:33:32 1770330812

It can actually go much lower. Gemini costs around $0.01/hour of transcription last time I checked.

tgrowazay · 2026-02-05T05:08:45 1770268125

Both AWS and Mistral prices above are per minute of input audio.

Curiositry · 2026-02-06T19:27:51 1770406071

If Voxtral can process rapid speech as well as it claims to, an obvious cost optimization would be to speed up normal laconic speech to the maximum speed the model can handle accurately.