Roadmap | AssemblyAI

2026 Roadmap

Frontier Speech AI models

Universal 3.1 and 3.2 Pro ship in Q2 and Q3 across async and realtime, followed by Universal 4 Pro Realtime in Q4 targeting the lowest end-to-end turn latency and best-in-class voice-agent audio handling. Universal TTS 1 ships in Q3, completing the in-house voice AI stack. Universal 1 Duplex (Preview), our native speech-to-speech model, follows in Q1 2027. Native-language coverage jumps from 6 to 30+, with noise cancellation, streaming PII redaction, and self-hosted realtime.
Voice AI infrastructure platform

One API for every voice AI model — ours and the best open and community ones. Voice Agent API ships in Q2 with Twilio integration, and the Edge Voice Agent Platform in Q3 adds agent management, session storage, and edge functions. Community Models span Async STT, Streaming STT, LLM Gateway, and TTS, so customers get the right model for every language, domain, and price point without ever leaving the platform.

Async #

Speech-to-text for pre-recorded audio.

Upcoming

Universal 3.1 Pro Async #Q2 2026

The next Universal-3 Pro release. Native-language coverage jumps from 6 to 30+, adding Japanese, Korean, Hindi, Arabic, Turkish, and others. Accuracy improves on the six core languages.

New languages: Japanese, Vietnamese, Arabic, Dutch, Swedish, Hindi, Norwegian, Finnish, Danish, Urdu, Hebrew.
Faster Turnaround Time #Q2 2026

Universal-3 Pro transcription roughly twice as fast end-to-end. Phase one is in production and has already cut turnaround time by 30-80% across different audio durations since the model’s release.
Community Models #Q2–Q4 2026

Open-source speech-to-text models served directly through our API, giving immediate access to the best available models for the languages and domains they specialize in.
Universal 3.1 Pro Async Diarization #Q2 2026

Significantly better speaker labeling in noisy and multi-speaker audio. Focuses on the two errors customers report most often: mislabeled short replies like “yeah” and “uh-huh”, and speaker turns that don’t line up with punctuation.
Universal 3.2 Pro Async #Q3 2026

A follow-up to Universal-3.1 Pro. Focused on prompt-following, proper nouns and named entities, and broader language coverage.
Voice Fingerprinting #Q3 2026

Recognize the same speaker across different recordings, not just within a single file. Useful for meetings, call centers, and cross-session analytics workflows.
Self-Hosted Async #Q4 2026

On-premise deployment of universal-3-pro for regulated environments with strict data-residency requirements.
Universal 4 Pro Async #Q1 2027

The next major accuracy and capability release after Universal-3.2 Pro.

Recently shipped

2026-04-14
Universal 3 Pro Async Timestamp Improvements # — Major improvement to Universal-3 Pro’s timestamp calculation, delivering median precision gains of 15.3% for English and 8.6% for non-English, with P99 improvements of 15.0% and 58.4% respectively.
2026-03-30
Hebrew & Swedish # — Major accuracy gains in Hebrew and Swedish via community-model integrations. Word error rates dropped 37% and 47%.
2026-03-25
Medical Mode # — An LLM-powered correction pass for medical terminology (drug names, procedures, clinical entities). On our medical benchmark, it achieves a 4.97% error rate versus 7.32% for the next-best vendor. Available as an add-on to Universal-3 Pro in English, Spanish, German, French, Portuguese, and Italian.
2026-03-16
PII Audio Redaction using Silence # — Redact PII with silence instead of a beep. Reduces listener fatigue when redacted audio is replayed at scale in call-center and compliance workflows.
2026-02-03
Universal 3 Pro Async # — Promptable speech-to-text with natural-language and custom-vocabulary prompts, mid-sentence language switching across six core languages, and audio tagging.
2026-01-28
Improved Short-Audio Diarization # — 19% better speaker-count accuracy and 6% lower speaker-attributed word error rate on audio under two minutes.
2026-01-02
Multichannel Diarization # — Per-channel speaker labels for multi-microphone recordings. Eliminates crosstalk ambiguity in call-center and meeting audio.

Realtime #

Low-latency streaming speech-to-text for live audio.

Upcoming

Universal 3.1 Pro Realtime #Q2 2026

The next universal-realtime-3-pro release, with better noise handling, higher voice-agent accuracy, continuous speaker-labeling gains, and PII redaction. Early English results already beat universal-realtime-3-pro on numbers, medical terms, accented speech, and alphanumerics. Native-language coverage jumps from 6 to 30+, adding Japanese, Korean, Hindi, Arabic, Turkish, and others.

New languages: Japanese, Vietnamese, Arabic, Dutch, Swedish, Hindi, Norwegian, Finnish, Danish, Urdu, Hebrew.
Noise Cancellation #Q2 2026

Realtime noise suppression for voice agents and telephony. Delivers clean audio into transcription and downstream LLMs so accuracy holds up in real call-center conditions, with no separate preprocessor required.
Streaming PII Redaction #Q2 2026

PII detection and redaction in the realtime pipeline for HIPAA, PCI, and other compliance-sensitive workloads. Configurable entity types and substitution modes.
Universal 3.2 Pro Realtime #Q3 2026

A follow-up to Universal-3.1 Pro Realtime. Focused on prompt-following, proper nouns and named entities, broader language coverage, and closing remaining accuracy gaps that enterprise customers care about.
Universal 4 Realtime #Q3 2026

A fast, cost-efficient realtime model for notetaking and meeting intelligence. Optimized for long-form audio where throughput, stable speaker labeling, and sustained accuracy over multi-hour sessions matter more than minimizing latency.
Self-Hosted Realtime #Q3 2026

On-premise deployment of universal-realtime-3-pro for regulated environments with strict data-residency requirements.
Universal 4 Pro Realtime #Q4 2026

The next-generation realtime model for voice agents. Targets the lowest end-to-end turn latency and the strongest handling of voice-agent audio (noise, interruptions, mid-turn hesitation, accented and non-native speech). Instruction-following is strong enough that a single model replaces today’s speech-to-text + LLM + TTS stacks. Multilingual across 15+ native languages and the foundation for our native speech-to-speech architecture.
Universal 1 Duplex (Preview) #Q1 2027

A single native end-to-end model that replaces today’s Voice Agent pipeline (speech-to-text, LLM, text-to-speech) with a unified Realtime Speech LLM. Tighter latency, better prosody control, and more natural interruption handling than orchestrated stacks can deliver.

Recently shipped

2026-03-24
Medical Mode # — An LLM-powered correction pass for medical terminology. 4.97% error rate versus 7.32% for the next-best vendor. Available in both async and streaming on universal-realtime-3-pro.
2026-03-17
Streaming Diarization v1.5 # — Speaker-aware sentence splitting for cleaner segmentation. 4–5% lower word error rate, 56% fewer phantom speakers, and clear gains on the CallHome and AMI speaker-labeling benchmarks.
2026-03-03
Universal 3 Pro Realtime # — Realtime speech-to-text with inline streaming speaker labeling, custom vocabulary prompts up to 1,000 words, audio tagging, filler-word control, mid-sentence language switching, and 99+ language support via Whisper routing for long-tail languages. EU region support.
2026-03-03
Whisper Streaming # — The first community model in our streaming API, shipped alongside Universal 3 Pro Realtime.
2026-01-20
Edge Routing and Data Zone Endpoints # — Global low-latency routing with US/EU data-residency endpoints. No additional charge.

Voice Agents #

End-to-end Voice Agent API.

Upcoming

Voice Agent API #Q2 2026

Production release of the Voice Agent API (formerly Speech-to-Speech API). Built on universal-realtime-3-pro, LLM Gateway, and text-to-speech running on self-hosted LiveKit. PCI-certified.
Twilio Integration #Q2 2026

Direct Twilio SIP and voice connectivity. Phone integration without customer-side LiveKit plumbing.
Edge Voice Agent Platform #Q3 2026

Full programmatic control of voice agents, with session data and tool execution running at the edge. Create, update, and version agents as code through a management API, persist and retrieve conversation sessions (events, transcripts, tool calls) through public endpoints, and run webhooks or tool calls at the edge instead of round-tripping to origin.
SDK #Q3 2026

Official client libraries, starting with Python and TypeScript.
Turn Detection Model #Q3 2026

A custom turn detection model trained on Universal 3 Pro streaming outputs for market-leading turn detection performance. Reduces false endpointing and improves handling of pauses, hesitations, and overlapping speech in Voice Agent API calls.

Recently shipped

2025-12-17
Voice Agent Preview # — First public release of end-to-end voice AI. Combines universal-realtime-3-pro, LLM Gateway, and text-to-speech on LiveKit.

TTS #

Text-to-speech built for voice agents.

Upcoming

Universal TTS 1 #Q3 2026

A standalone text-to-speech model for production voice workloads. Low time-to-first-byte, voice prompting and customization, and accurate delivery of phone numbers, email addresses, named entities, and other content today’s TTS systems struggle with.
Community Models #Q4 2026

Open-source text-to-speech models served directly through our API alongside Universal TTS, giving immediate access to the best available voices for the languages, styles, and domains they specialize in.

Speech Understanding #

Extract meaning, sentiment, and events from audio.

Upcoming

Improved Summarization #Q2 2026

Summarization via the LLM Gateway, replacing legacy LeMUR summaries. Quality gains come from routing through frontier models with automatic fallbacks.
Improved Auto-Chaptering #Q2 2026

Sharper chapter boundaries and titles via the LLM Gateway. Clearly better topic segmentation on long-form content.
Speaker ID, Translation, and Custom Formatting v2 #Q2 2026

Accuracy and quality improvements to Speaker ID, Translation, and Custom Formatting. Translation covers live streaming and pre-recorded audio. Enables multilingual workflows where the spoken language differs from the output language.
Context-Based Transcription Correction #Q3 2026

Automatic LLM-based transcript correction with no user prompts, generalizing the Medical Mode pattern to any domain.
Key-Term Improvements #Q3 2026

Better custom-vocabulary (keyterm) prompting across every supported language, via LLM Gateway post-processing. Closes the quality gap with English for Spanish, German, French, Portuguese, and Italian.
Emotion Detection #Q4 2026

Detect speaker emotions and emotional shifts in input audio. Distinct from Voice Agents Emotion and Style Tagging, which controls TTS output. Useful for therapy, CX scoring, and compliance monitoring.

LLM Gateway #

One API for every major LLM. Built-in fallbacks and audio-first integration.

Upcoming

Community Models #Q1–Q4 2026

Ongoing catalog expansion. The Gateway currently supports 24 models across Anthropic, OpenAI, Google, Qwen, and Kimi. DeepSeek, Mistral, Llama, and Cohere are next, with more open-source models to follow.
Prompt Caching #Q2 2026

Prompt-cache pass-through for supported providers, so customers keep the cache discount while routing through the Gateway.
Reasoning Mode #Q3 2026

Reasoning (extended-thinking) controls exposed through the Gateway, on models that support them.
Service Lanes #Q3 2026

Priority, standard, and flex request tiers for per-request cost and latency control.

Recently shipped

2026-04-17
Claude Opus 4.7 # — Anthropic’s most capable model, available through the Gateway on day one.
2026-03-25
Automatic Model Fallbacks # — The Gateway retries failed requests against a configurable fallback model. Single-provider outages no longer surface as customer-facing failures.
2026-03-24
Qwen3, Qwen3 Next, Kimi K2.5 # — Three new high-capability models added to the catalog.
2026-02-19
Claude Sonnet 4.6 # — Anthropic’s best price-performance frontier model at release.
2026-02-09
Claude Opus 4.5 and 4.6 # — Anthropic’s most capable models, available through the Gateway on day one.

Open Benchmarks #

Transparent, reproducible benchmarks across Universal and community models.

Upcoming

Public Benchmark Leaderboard #Q2 2026

A public version of our internal evaluation dashboard, covering 30+ competitors and the community models we serve across dozens of metrics (word error rate, speaker-labeling accuracy, keyterm accuracy, timestamp precision, mid-sentence language switching). Customers can verify accuracy and pick the right model per language and domain.
Open-Source Benchmark Stack #Q3 2026

Open-source release of the evaluation dashboard and supporting tooling, so customers and researchers can reproduce our benchmarks across all models on their own data.
Open-Source Benchmark Dataset #Q4 2026

A realistic Voice-AI evaluation set of telephony, meeting, and voice-agent audio with proper ground-truth transcripts. The same dataset is used to grade every model on the leaderboard, Universal and community alike. Not a thirty-second YouTube clip or LibriSpeech.

Developer Experience #

The dashboard, accounts, and tooling that make AssemblyAI easy to adopt.

Upcoming

Multi-User Accounts #Q2 2026

Invite teammates with role-based access. Organization setup, member management, RBAC, full auth flows, MFA enforcement, account switching, and ownership transfer.
Single Sign-On (SSO) #Q2 2026

SAML and OIDC, as a fast-follow to multi-user accounts.
Self-Service Onboarding Wizard #Q2 2026

Guided setup for new accounts, including model selection, API key creation, first-request walkthrough, and best-practice defaults.
Enhanced Billing Controls #Q3 2026

Hard spend caps and team budgets, beyond today’s soft alerts.
Enhanced API Metrics #Q3 2026

Deeper observability in the dashboard. P50 and P95 turnaround time, webhook delivery statistics, uptime, and latency histograms.
Usage and Spend API #Q4 2026

Programmatic access to billing and usage data.
Enhanced Usage Alerts #Q4 2026

Configurable thresholds and cadences, including daily alerts and custom trigger conditions.
3D Secure Payments #Q4 2026

Regulatory-compliant card support for EU PSD2 Strong Customer Authentication requirements.

Recently shipped

2026-03-17
AssemblyAI Skill for AI Coding Agents # — Claude Code, Cursor, and Codex now ship with a native AssemblyAI skill. It gives them accurate knowledge of our API out of the box and cuts hallucinated API usage in agent-generated code.
2026-02-26
Shareable Playground Transcripts # — One-click shareable links to Playground output. Trivial to show off a transcript or hand one off for internal review.