Trelis Research

ASR Model Comparison

Open weights vs proprietary, WER benchmarks, pricing, and fine-tuning support for production ASR models. Includes CER on Trelis AI-Terms (technical AI terminology).

Open Weights Weights available for download
Proprietary API-only access
Supported on Trelis Studio

Model Comparison

Model Creator Access License Params FLEURS WER
(multilingual avg, from research)
AI-Terms
(CER, semi-private)
AI-Terms
(Entity CER, semi-private)
API Price/min Trelis Eval Trelis Training Key Features
Voxtral Family (Mistral AI)
Voxtral Mini Transcribe V2 Mistral AI Proprietary Proprietary ~3B ~4.0% $0.003 Diarization, word timestamps, context biasing, 13 languages, 3hr audio
Voxtral Realtime Mistral AI Open Weights Apache 2.0 4B $0.006 Streaming, sub-200ms latency, 13 languages, edge-deployable
Voxtral Small (24B) Mistral AI Open Weights Apache 2.0 24B ~5.1% $0.003 Audio understanding, Q&A, summarization, function calling, 32k context
Voxtral Mini Transcribe Mistral AI Proprietary Proprietary ~3B ~5.5% $0.001 Cheapest option, transcription-optimized
Voxtral Mini (3B) Mistral AI Open Weights Apache 2.0 3B ~7.1% 2.7% 4.4% Self-hosted Audio understanding, Q&A, summarization, edge-friendly, 32k context
Whisper Family (OpenAI)
Whisper large-v3 OpenAI Open Weights MIT 1.5B ~8.3% 3.1% 6.2% Self-hosted Word timestamps, 99 languages, mature ecosystem, whisper.cpp, faster-whisper
Whisper large-v3-turbo OpenAI Open Weights MIT 809M 2.9% 4.8% Self-hosted 2x faster than v3, word timestamps, 99 languages, great for fine-tuning
Whisper v3 Large (Fireworks) OpenAI Open Weights MIT 1.5B N/A 3.3% 7.2% ~$0.0015 Whisper v3 hosted by Fireworks, 99+ languages, lowest cost API. Via Trelis Router.
Proprietary APIs
GPT-4o mini Transcribe OpenAI Proprietary Proprietary N/A ~5.7% $0.003 OpenAI API, easy integration
Gemini 2.5 Pro Google Proprietary Proprietary N/A ~6.7% 2.8% 5.5% ~$0.003 Multimodal, long context, audio understanding. Via Trelis Router.
Gemini 2.5 Flash Google Proprietary Proprietary N/A ~7.0% ~$0.003 Multimodal, long context, audio understanding
ElevenLabs Scribe v2 ElevenLabs Proprietary Proprietary N/A 6.3% 2.2% $0.010 Diarization, word timestamps, 99 languages. Via Trelis Router.
Deepgram Nova 3 Deepgram Proprietary Proprietary N/A N/A 7.8% 8.2% ~$0.008 Diarization, streaming, custom vocabulary. Via Trelis Router.
AssemblyAI Universal 3 Pro AssemblyAI Proprietary Proprietary N/A N/A 2.4% 3.3% ~$0.0035 Word timestamps, 6 languages, prompting. Via Trelis Router.
Speechmatics Ursa 2 Enhanced Speechmatics Proprietary Proprietary N/A N/A 4.9% 9.0% ~$0.0125 Enterprise-grade, 70 languages, best-in-class accuracy. Via Trelis Router.
Qwen Family (Alibaba)
Qwen3-ASR-1.7B Alibaba / Qwen Open Weights Apache 2.0 1.7B ~4.9% 8.2% 6.3% Self-hosted 52 languages/dialects, language detection, singing/music recognition, streaming
Qwen3-ASR-0.6B Alibaba / Qwen Open Weights Apache 2.0 0.6B ~7.6% 8.9% 8.4% Self-hosted Lightweight variant, 52 languages/dialects, edge-deployable
NVIDIA
Parakeet TDT 0.6B v3 NVIDIA Open Weights CC-BY 4.0 0.6B N/A 3.4% 5.0% Self-hosted NeMo framework, CTC/TDT decoder, 25 European languages, word timestamps
Meta
OmniASR-LLM-7B Meta Open Weights Apache 2.0 7.8B N/A 10.2% 23.9% Self-hosted LLM-based ASR, 1600+ languages, also available in 300M/1B/3B sizes
Microsoft
VibeVoice-ASR Microsoft Open Weights MIT 8B N/A 6.4% 5.3% Self-hosted 50+ languages, code-switching support, word timestamps
Moonshine (Useful Sensors)
Moonshine Base Useful Sensors Open Weights MIT 61M N/A 5.2% 9.3% Self-hosted Ultra-lightweight, edge-first, English-only
Moonshine Tiny Useful Sensors Open Weights MIT 27M N/A 7.8% 13.6% Self-hosted Smallest model, IoT/mobile edge, English-only
Other Open Models
Kyutai STT (1B / 2.6B) Kyutai Open Weights CC-BY 4.0 1B / 2.6B N/A Self-hosted Streaming, word timestamps, voice prompting, Rust server

Notes