OpenAI Teases New Voice Engine, Stops Short of Full Release

Clips showcasing the text-to-voice AI model, Voice Engine, demonstrated "emotive and realistic voices"

Ben Wodecki, Jr. Editor

April 5, 2024

3 Min Read
Getty Images

OpenAI has provided a glimpse of its new text-to-audio AI model Voice Engine, but has stopped short of releasing it.

Voice Engine can turn text inputs into natural-sounding speech. The model can also take input in one language, such as  English, and return audio in another language, such as Spanish.

OpenAI showcased 15-second generations in a blog post, touting that its model can “create emotive and realistic voices.”

The company has been working on Voice Engine since late 2022 and has used it to power preset voices available in its text-to-speech API as well as ChatGPT’s Voice functionality.

The model is not yet being published as OpenAI says it’s exercising caution due to concerns about potential misuse.

“We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities,” according to a company statement. “Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”

OpenAI provided a hint toward Voice Engine last week when it filed a trademark application for a service mark, indicating its association with services rather than tangible goods, which suggested a new voice-related product was in the works.

Related:OpenAI Offers Clue to Voice Assistant Plans

A small group of partner firms has been given access to Voice Engine. In test deployments the model has been used to provide reading assistance to non-readers and children, translate video and podcast content and provide voices for avatars in sales demonstrations.

Those given access to Voice Engine are barred from using it to impersonate people. Users are also not allowed to use the model to create their own voices, with OpenAI creating a “no-go voice list” that detects if the model is used to generate audio of voices too similar to prominent figures.

Ahead of a wider release, OpenAI suggested banks should phase out voice-based security authentication. Voice AI systems have already been used to circumvent voice authentication. In 2021, scammers duped an Emirati bank manager out of $35 million after cloning customer voices.

Voice Engine generations can be traced as OpenAI implemented watermarking to detect content generated by the model. OpenAI said it hopes more techniques like its watermarking system can be developed to trace the origin of audio content.

“We recognize that generating speech that resembles people's voices has serious risks, which are especially top of mind in an election year,” according to the company. “We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build.”

Related:OpenAI’s Whisper v3: Improved Speech Recognition for Business Applications

OpenAI has a dedicated team tasked with vetting models for safety before deployment. The company’s board also has the power to reverse decisions on systems over potential safety concerns.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like