Google's Gemini 3.5 Live Translate lands with 70-language real-time speech

by Luis Rijo
Luis Rijo
Luís Rijo is a seasoned marketing professional with over 10 years of experience in Digital Marketing, Search, Social, Display, Video, and DOOH. Based in Europe. Also writing in the spend. Reach out via luis@ppc.land
- LinkedIn
•
June 10, 2026
•
8 min read

Gemini 3.5 Live Translate logo on a light blue gradient background with abstract shapes.

Google yesterday released Gemini 3.5 Live Translate, a new audio model that converts spoken language to spoken language in near real-time across more than 70 languages. The model is rolling out simultaneously to developers via the Gemini Live API, to enterprise customers in Google Meet, and to consumers through the Google Translate app on Android and iOS - marking one of the broadest single-day launches Google has staged for a translation product.

The announcement, published June 9, 2026 on The Keyword, Google's official blog, was authored by Anuda Weerasinghe, Product Manager, and Tony Lu, Senior Staff Software Engineer.

What makes 3.5 Live Translate different from earlier systems

Most speech translation systems operate in discrete turns: they wait for a speaker to stop, process the completed utterance, then produce a translation. According to Google, Gemini 3.5 Live Translate departs from that architecture. The model processes speech as it is streamed, generating translated audio continuously rather than waiting for pauses.

Google describes the trade-off the model navigates as balancing "waiting for context to improve quality" against "translating immediately to stay in sync with the speaker." The result, according to the announcement, is that translated audio stays just a few seconds behind the speaker throughout the session, without the awkward gaps that characterize turn-based approaches.

The model handles more than 70 languages automatically, without requiring users to manually select source or target languages beforehand. That automatic detection is paired with what Google calls noise robustness - the ability to operate in loud, unpredictable environments where background audio would disrupt systems relying on clean signal inputs.

Beyond accuracy, the model is designed to preserve the acoustic character of the original speaker. According to Google, 3.5 Live Translate maintains the speaker's intonation, pacing, and pitch in the translated output. This is a design choice with practical consequences: in a meeting with multiple participants, listeners can still distinguish individual voices and register the emotional register of what is being said.

Developer access via the Gemini Live API and Google AI Studio

Developers can access Gemini 3.5 Live Translate in public preview through the Gemini Live API and through Google AI Studio starting today. The API is designed for applications requiring real-time media streaming - live interpretation for multilingual calls, meetings, lessons, and broadcasts among them.

Several developer platforms have already built integrations. According to Google, Agora, Fishjam, LiveKit, Pipecat, and Vision Agents have all connected to the Gemini Live API, allowing developers building voice translation applications to use these platforms' infrastructure for handling real-time media streams. This separates the media engineering problem from the translation problem: platform providers handle streaming complexity while developers focus on the user experience layer.

Jesse Hall, Staff Developer Advocate at LiveKit, described the model as making multilingual voice effortless. According to Hall, a demo he built on LiveKit Agents allowed everyone in a session to speak their own language and understand each other live.

Google has also published example code in the Gemini Cookbook, and a demo of the Gemini Live API in action - including dubbing and simultaneous multi-language translation - is publicly available.

The Gemini 3.5 series was introduced at Google I/O 2026 on May 19, where the company also upgraded AI Mode to Gemini 3.5 Flash. The translation model released today represents the audio specialization within that generation.

Google Meet gets 70+ languages, ending the English-only constraint

The most significant structural change in today's announcement may be what it does to Google Meet. The video conferencing platform has offered speech translation before, but the prior implementation was narrow. According to Google, the existing system supported only five languages and could only translate to and from English - limiting conversations to a single language pair at a time.

Gemini 3.5 Live Translate changes that in several ways. First, language support expands from five to more than 70. Second, the system will support conversations across more than 2,000 language combinations in a single meeting, compared to the previous English-only constraint. Third, the interface is being updated to provide instant access to speech translation rather than requiring users to navigate through settings.

Google is launching the Google Meet upgrade in private preview for select business Google Workspace customers starting this month, with a broader rollout described as coming later this year.

This rollout follows a progression that has been tracked at PPC Land. Google Meet speech translation became generally available for Google Workspace business customers on January 27, 2026, after an earlier beta that launched at Google I/O 2025. That January release supported five bidirectional language pairs: English, Spanish, French, German, Portuguese, and Italian. The meeting with Gemini 3.5 Live Translate removes the English-only constraint entirely and multiplies the language count more than tenfold.

Google Translate app adds listening mode for Android

The Google Translate mobile app is also receiving Gemini 3.5 Live Translate today, on both Android and iOS, for the Live translate feature. When connected to any pair of headphones, the model delivers translated audio that mirrors the speaker's tone across more than 70 languages.

For Android specifically, Google is rolling out an additional feature called listening mode. The mode lets users hear translations directly through a phone's earpiece. The user holds the phone to their ear as in a normal call, and translated audio streams to them without requiring headphones. According to Google, this is designed for situations where a user wants to hear a translation quickly and discreetly, without headphones available.

The example Google gives is a guided tour in Spanish being heard as a near real-time English translation directly through the phone's earpiece. That scenario - a tourist, a museum guide, a local shopkeeper speaking a language the user doesn't know - is the kind of interaction where previous approaches required a visible screen, active tapping, or at minimum the appearance of using a device.

Google's translation products have been expanding steadily. Google Translate marked its 20th anniversary in April 2026serving more than 1 billion monthly users and supporting close to 250 languages, translating approximately 1 trillion words per month. According to that announcement, translation is also among the most-used capabilities in Circle to Search on Android. PPC Land reported in September 2025 that Google added continuous translation to Circle to Search, allowing users to scroll through content without restarting translation on each screen.

SynthID watermarking on all audio output

All audio generated by Gemini 3.5 Live Translate is watermarked with SynthID, Google's imperceptible AI content marking technology. According to Google, the watermark is woven directly into the audio output rather than added as separate metadata, making it detectable even if the audio is processed or modified.

The watermark is not audible to humans but can be detected algorithmically. Google describes the purpose as ensuring AI-generated content remains identifiable, with the specific goal of helping prevent misinformation that could arise from synthetic audio being mistaken for authentic human speech.

PPC Land covered the broader SynthID system in December 2025, when Google enabled users to verify whether videos were created or edited using Google AI. That system covers both audio and visual tracks, with timestamp-specific feedback that distinguishes fully synthetic content from partially edited media. DeepMind originally unveiled SynthID as a tool for watermarking and identifying AI-generated content.

The application of SynthID to translated speech adds a layer of provenance to audio that flows through communications platforms - a consideration that matters in contexts ranging from legal proceedings to international media.

The Grab use case: 10 million voice calls per month

One of the most concrete deployment examples in today's announcement involves Grab, the Southeast Asian ride-hailing and super-app platform. According to Google, Grab is testing Gemini 3.5 Live Translate to enable multilingual communication in near real-time between drivers and travelers at pickups. The Grab user base makes more than 10 million voice calls per month through the platform.

The driver-traveler pickup scenario involves two parties who may not share a language trying to coordinate a specific physical location - a scenario where miscommunication has direct operational consequences. Near real-time speech translation addresses that without requiring either party to type or switch apps.

The Grab partnership is described as a test rather than a completed deployment, suggesting the integration is still being evaluated ahead of a potential broader rollout across Grab's markets in Southeast Asia.

Context: twenty years of machine translation at Google

Google notes that its translation work began twenty years ago as a machine learning experiment. What was described as an attempt to "turn the science of language into the magic of human connection" now processes approximately 1 trillion words per month for billions of users across Google's products.

The jump from text-based translation to continuous speech-to-speech translation - preserving voice, intonation, and pitch in real time - represents a qualitative shift in what machine translation can do. Text translation operates on completed utterances. Speech-to-speech translation at the capability level Google describes today requires the model to make decisions about how to render language while the speaker is still mid-sentence, without waiting for semantic closure.

Google launched Gemini 3 in November 2025 with generative UI for dynamic search experiences, and has since extended Gemini capabilities progressively across products. Inside Google I/O 2026, the company's technical leaders described an architecture centered on agentic capabilities and model specialization. The Live Translate model is a specialized audio model rather than a general-purpose language model - a design decision that reflects the computational and latency constraints of real-time speech.

What this means for international marketing and communications

For organizations running international communications - whether multinational businesses using Google Workspace, developers building customer-facing voice applications, or platforms like Grab serving multilingual user bases - the practical effect of today's release is access to a speech translation capability that was previously limited to professional interpreting services or to narrow language pairs in enterprise software.

The expansion of Google Meet from 5 languages to 70+, and from English-only pairs to more than 2,000 language combinations, is the most operationally significant change for enterprise users. A sales call, a support conversation, or an internal all-hands meeting that previously required a human interpreter or was restricted to a small set of supported languages can now proceed in any of the 70+ supported languages without additional infrastructure.

For developers, the Gemini Live API and the existing partner integrations from LiveKit, Agora, and others mean that building a voice translation application no longer requires solving the real-time streaming infrastructure problem independently.

The private preview for Google Meet starts this month for select Workspace customers. Broader availability is expected later in 2026.

Timeline

April 2006 - Google Translate launches as a statistical machine translation service.
May 2025 - Google introduces speech translation in Google Meet at Google I/O 2025, initially supporting English and Spanish in beta for Google One AI Premium subscribers.
September 4, 2025 - Google adds continuous translation to Circle to Search on Android, launching initially on Samsung Galaxy devices.
November 18, 2025 - Google launches Gemini 3 with generative UI for dynamic search experiences.
January 27, 2026 - Google Meet speech translation becomes generally available for Google Workspace business customers, supporting five bidirectional language pairs: English, Spanish, French, German, Portuguese, and Italian.
May 19, 2026 - Gemini 3.5 series announced at Google I/O 2026, including Gemini 3.5 Flash upgrade to AI Mode.
April 28, 2026 - Google Translate marks 20th anniversary with 1 billion monthly users and close to 250 languages supported.
June 9, 2026 - Google releases Gemini 3.5 Live Translate, making it available to developers in public preview via the Gemini Live API and Google AI Studio, to enterprises in private preview via Google Meet starting this month, and to consumers via the Google Translate app on Android and iOS.

Summary

Who: Google, announced by Anuda Weerasinghe (Product Manager) and Tony Lu (Senior Staff Software Engineer), with early testing conducted by Grab, LiveKit, CJ ENM, and other developer partners.

What: The release of Gemini 3.5 Live Translate, a speech-to-speech audio model that performs near real-time translation across more than 70 languages, preserving speaker intonation, pacing, and pitch. It is available to developers via the Gemini Live API and Google AI Studio in public preview, to Google Workspace enterprise customers in Google Meet in private preview, and to consumers in the Google Translate app on Android and iOS.

When: Announced and beginning to roll out on June 9, 2026. The Google Meet private preview starts in June 2026 for select business Workspace customers, with a broader rollout planned for later in 2026.

Where: Available globally via the Google Translate app on Android and iOS; via the Gemini Live API for developers; and in private preview for select Google Workspace business customers in Google Meet.

Why: Google has operated translation services for twenty years, but existing speech translation in Google Meet was limited to five languages and only translated to and from English. Gemini 3.5 Live Translate extends support to more than 70 languages, enables more than 2,000 language combinations in a single meeting, and removes the English-only constraint - addressing a significant gap for international enterprise users, multilingual consumer applications, and developer platforms building voice communication tools.

Luis Rijo

Luís Rijo is a seasoned marketing professional with over 10 years of experience in Digital Marketing, Search, Social, Display, Video, and DOOH. Based in Europe. Also writing in the spend. Reach out via luis@ppc.land