Meta unveils Audiobox, an AI text-to-audio generator

Audiobox significantly expands the capabilities of generative AI for audio, enabling users to create custom audio content with greater ease and control.

Luis Espada

Nov 30, 2023 - 2 min read

Meta has today unveiled Audiobox, its latest foundation research model for audio generation. Building upon the success of its predecessor, Voicebox, Audiobox significantly expands the capabilities of generative AI for audio, enabling users to create custom audio content with greater ease and control.

0:00

/0:11

Audiobox introduces several key advancements, including:

Describe-and-generate sound: Users can provide a natural language prompt describing the desired sound, and Audiobox will generate the corresponding audio. For instance, a prompt like "a running river and birds chirping" will produce a soundscape with those elements.
Describe-and-generate speech: Users can input a short description of the desired voice along with the transcript to be narrated, and Audiobox will generate speech in that voice.
Dual-input vocal restyling: Audiobox allows users to combine an audio voice input with a text style prompt to synthesize speech of that voice in any environment or emotion. This enables users to manipulate voice characteristics without losing the speaker's identity.
State-of-the-art controllability: Audiobox demonstrates superior controllability compared to previous models, allowing users to precisely specify the desired audio content.

Meta's commitment to responsible AI development is evident in Audiobox's design. The model incorporates automatic audio watermarking to trace audio created with Audiobox back to its origin, safeguarding against potential misuse. Additionally, a voice authentication feature prevents impersonation attempts.

Audiobox is currently being released to a hand-selected group of researchers and academic institutions with a proven track record in speech research. Meta seeks to foster collaboration within the research community to further develop Audiobox's capabilities and address potential ethical considerations responsibly.

In the long term, Meta envisions a future where Audiobox's capabilities empower anyone to create personalized audio content with ease. This technology holds immense potential for content creators, narrators, sound editors, game developers, and AI chatbot creators.

Meta's Audiobox marks a significant step forward in generative AI for audio, paving the way for a more accessible and creative audio landscape. With its emphasis on responsible development and collaboration, Meta says it is aiming to ensure that Audiobox's transformative power is harnessed for the benefit of all.