Open weights vs proprietary, WER benchmarks, pricing, and fine-tuning support for production ASR models. Includes CER on Trelis AI-Terms (technical AI terminology).
| Model | Creator | Access | License | Params | FLEURS WER (multilingual avg, from research) |
AI-Terms (CER, semi-private) |
AI-Terms (Entity CER, semi-private) |
API Price/min | Trelis Eval | Trelis Training | Key Features |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Voxtral Family (Mistral AI) | |||||||||||
| Voxtral Mini Transcribe V2 | Mistral AI | Proprietary | Proprietary | ~3B | ~4.0% | — | — | $0.003 | — | — | Diarization, word timestamps, context biasing, 13 languages, 3hr audio |
| Voxtral Realtime | Mistral AI | Open Weights | Apache 2.0 | 4B | — | — | — | $0.006 | — | — | Streaming, sub-200ms latency, 13 languages, edge-deployable |
| Voxtral Small (24B) | Mistral AI | Open Weights | Apache 2.0 | 24B | ~5.1% | — | — | $0.003 | — | — | Audio understanding, Q&A, summarization, function calling, 32k context |
| Voxtral Mini Transcribe | Mistral AI | Proprietary | Proprietary | ~3B | ~5.5% | — | — | $0.001 | — | — | Cheapest option, transcription-optimized |
| Voxtral Mini (3B) | Mistral AI | Open Weights | Apache 2.0 | 3B | ~7.1% | 2.7% | 4.4% | Self-hosted | ✓ | ✓ | Audio understanding, Q&A, summarization, edge-friendly, 32k context |
| Whisper Family (OpenAI) | |||||||||||
| Whisper large-v3 | OpenAI | Open Weights | MIT | 1.5B | ~8.3% | 3.1% | 6.2% | Self-hosted | ✓ | ✓ | Word timestamps, 99 languages, mature ecosystem, whisper.cpp, faster-whisper |
| Whisper large-v3-turbo | OpenAI | Open Weights | MIT | 809M | — | 2.9% | 4.8% | Self-hosted | ✓ | ✓ | 2x faster than v3, word timestamps, 99 languages, great for fine-tuning |
| Whisper v3 Large (Fireworks) | OpenAI | Open Weights | MIT | 1.5B | N/A | 3.3% | 7.2% | ~$0.0015 | ✓ | — | Whisper v3 hosted by Fireworks, 99+ languages, lowest cost API. Via Trelis Router. |
| Proprietary APIs | |||||||||||
| GPT-4o mini Transcribe | OpenAI | Proprietary | Proprietary | N/A | ~5.7% | — | — | $0.003 | — | — | OpenAI API, easy integration |
| Gemini 2.5 Pro | Proprietary | Proprietary | N/A | ~6.7% | 2.8% | 5.5% | ~$0.003 | ✓ | — | Multimodal, long context, audio understanding. Via Trelis Router. | |
| Gemini 2.5 Flash | Proprietary | Proprietary | N/A | ~7.0% | — | — | ~$0.003 | — | — | Multimodal, long context, audio understanding | |
| ElevenLabs Scribe v2 | ElevenLabs | Proprietary | Proprietary | N/A | — | 6.3% | 2.2% | $0.010 | ✓ | — | Diarization, word timestamps, 99 languages. Via Trelis Router. |
| Deepgram Nova 3 | Deepgram | Proprietary | Proprietary | N/A | N/A | 7.8% | 8.2% | ~$0.008 | ✓ | — | Diarization, streaming, custom vocabulary. Via Trelis Router. |
| AssemblyAI Universal 3 Pro | AssemblyAI | Proprietary | Proprietary | N/A | N/A | 2.4% | 3.3% | ~$0.0035 | ✓ | — | Word timestamps, 6 languages, prompting. Via Trelis Router. |
| Speechmatics Ursa 2 Enhanced | Speechmatics | Proprietary | Proprietary | N/A | N/A | 4.9% | 9.0% | ~$0.0125 | ✓ | — | Enterprise-grade, 70 languages, best-in-class accuracy. Via Trelis Router. |
| Qwen Family (Alibaba) | |||||||||||
| Qwen3-ASR-1.7B | Alibaba / Qwen | Open Weights | Apache 2.0 | 1.7B | ~4.9% | 8.2% | 6.3% | Self-hosted | ✓ | ✓ | 52 languages/dialects, language detection, singing/music recognition, streaming |
| Qwen3-ASR-0.6B | Alibaba / Qwen | Open Weights | Apache 2.0 | 0.6B | ~7.6% | 8.9% | 8.4% | Self-hosted | ✓ | ✓ | Lightweight variant, 52 languages/dialects, edge-deployable |
| NVIDIA | |||||||||||
| Parakeet TDT 0.6B v3 | NVIDIA | Open Weights | CC-BY 4.0 | 0.6B | N/A | 3.4% | 5.0% | Self-hosted | ✓ | ✓ | NeMo framework, CTC/TDT decoder, 25 European languages, word timestamps |
| Meta | |||||||||||
| OmniASR-LLM-7B | Meta | Open Weights | Apache 2.0 | 7.8B | N/A | 10.2% | 23.9% | Self-hosted | ✓ | — | LLM-based ASR, 1600+ languages, also available in 300M/1B/3B sizes |
| Microsoft | |||||||||||
| VibeVoice-ASR | Microsoft | Open Weights | MIT | 8B | N/A | 6.4% | 5.3% | Self-hosted | ✓ | — | 50+ languages, code-switching support, word timestamps |
| Moonshine (Useful Sensors) | |||||||||||
| Moonshine Base | Useful Sensors | Open Weights | MIT | 61M | N/A | 5.2% | 9.3% | Self-hosted | ✓ | ✓ | Ultra-lightweight, edge-first, English-only |
| Moonshine Tiny | Useful Sensors | Open Weights | MIT | 27M | N/A | 7.8% | 13.6% | Self-hosted | ✓ | ✓ | Smallest model, IoT/mobile edge, English-only |
| Other Open Models | |||||||||||
| Kyutai STT (1B / 2.6B) | Kyutai | Open Weights | CC-BY 4.0 | 1B / 2.6B | N/A | — | — | Self-hosted | — | — | Streaming, word timestamps, voice prompting, Rust server |