Gemini 3.5 Live Translate brings natural speech-to-speech

June 24, 2026June 11, 2026 by Tezeh Collins

Twenty years ago, translation at Google started as one of our early machine learning efforts: take the science of language and turn it into real human connection. Since then, that initial experiment has scaled dramatically, translating over a trillion words each month for billions of users across Google products.

Now Google is taking the next step with Gemini 3.5 Live Translate, a new audio model for live, speech-to-speech translation. The goal is straightforward but technically demanding: make real-time translation feel less like a tool and more like a conversation.

What “fluid” translation actually requires

Most people have experienced turn-by-turn voice translation that waits for a speaker to finish, then responds. That structure often produces unnatural pacing: pauses that break conversational flow, delayed reactions, and a robotic cadence that can flatten meaning.

Gemini 3.5 Live Translate approaches the problem differently. It generates translated speech continuously and stays only a few seconds behind the speaker. Under the hood, it balances a key trade-off: waiting long enough to gather context (to improve accuracy and phrasing), while still speaking quickly enough to remain synchronized. That balance is what reduces awkward gaps and makes the output feel more like live interpretation than a sequence of disconnected segments.

Preserving prosody across 70+ languages

Gemini

Translation quality isn’t only about correct words. For spoken language, prosody carries intent: intonation, pacing, and pitch shape whether something sounds like a question, a concern, a joke, or an apology.

Gemini 3.5 Live Translate automatically detects 70+ languages and generates smooth, natural-sounding speech while preserving those vocal cues. This matters for high-stakes communication where tone changes meaning: customer support escalations, multilingual team meetings, live lessons, or travel logistics. The model also supports multilingual inputs without requiring manual language configuration, which reduces friction for real-world conversations that switch languages midstream.

Another practical requirement is resilience to noisy environments. Live translation often happens where audio quality is imperfect: street traffic, crowded pickup areas, busy offices, or echoey rooms. Noise robustness helps keep translation usable when conditions are unpredictable.

Where it’s rolling out and how to build with it

Gemini

Gemini 3.5 Live Translate is rolling out starting today across Google products: For developers, it’s available in public preview through the Gemini Live API and Google AI Studio. For enterprises, it’s entering private preview in Google Meet starting this month. For everyone, it’s rolling out globally in the Google Translate app on Android and iOS.

Developers can use the Gemini Live API to translate streamed speech as it arrives, enabling live interpretation for calls, meetings, broadcasts, and more. Platforms like Agora, Fishjam, LiveKit, Pipecat, and Vision Agents integrate with the API to handle real-time media streaming infrastructure, letting teams focus on latency, UX, and product fit rather than the plumbing.

A concrete example: Grab is testing the model to support near real-time multilingual communication between drivers and travelers at pickups. Grab users place over 10 million voice calls per month through the app, making even small improvements in flow and clarity meaningful at scale.

Conclusion

Gemini

For end users, Gemini 3.5 Live Translate is landing in Google Translate on Android and iOS. With Live translate, connecting headphones enables a more seamless experience that mirrors the speaker’s tone across languages. Android is also beginning to roll out a new listening mode that plays translated audio through the phone’s earpiece, useful when you want a more private translation and don’t have headphones available.

For the AI community, the release is a clear signal: the next leap in translation isn’t just broader language coverage or better benchmarks. It’s real-time, prosody-aware speech generation that keeps up with human conversation while sounding like it belongs there.

Introducing Gemini 3.5 Live Translate

Google Warns AI. Helped Hackers Find a Zero-Day Flaw

Google’s COSMO Android AI App Vanishes After the Debut

Leave a Comment