Data

Google's Gemini 1.5 Pro AI Model expands globally, adds audio understanding

Google this week announced a major expansion of its Gemini 1.5 Pro AI model.

Luis Rijo

Apr 11, 2024 • 1 min read

Google's Gemini 1.5 Pro

Google this week announced a major expansion of its Gemini 1.5 Pro AI model. The language model, known for its 1-million-token context window, is now available via the Gemini API in over 180 countries. Key updates include groundbreaking native audio understanding, improved developer controls, and a new file API.

Key Advancements

Audio Understanding: Gemini 1.5 Pro can now directly process and understand audio inputs, both within the Gemini API and Google AI Studio. This opens up a wide range of use cases, such as transcribing lectures or analyzing customer service calls.
Video Capability: In Google AI Studio, Gemini 1.5 Pro can reason across both image and audio components of videos. API support for this feature is coming soon.
System Instructions: Developers can now provide detailed instructions to tailor Gemini 1.5 Pro's responses for specific needs. This allows for more focused and accurate output.
JSON Mode: Gemini 1.5 Pro can output responses as structured JSON objects, making it easier to extract and process information.
File API: A new File API simplifies the handling of various file types within the AI Studio environment.
New Embedding Model: Google has released an improved text embedding model, surpassing the performance of previous offerings.

Transforming Use Cases

These significant updates enable innovative use cases for Gemini 1.5 Pro. For example:

Long-form Content Analysis: Detailed analysis of lengthy lectures, podcasts, or videos is made possible through the audio understanding and expanded context window.
Enhanced Data Extraction: JSON mode streamlines information extraction from text and images.
Customer Service Automation: Audio analysis can automate support interactions, transcribing calls and identifying key issues.

Developers are encouraged to obtain an API key through Google AI Studio and explore the new Gemini API Cookbook.

Stories like this, in your inbox