Delays and Concerns Over Misuse
The company’s hesitation in launching Voice Engine at scale suggests concerns over potential misuse, particularly with deepfake technology. At the same time, OpenAI may also be avoiding regulatory scrutiny, as it has historically been criticized for prioritizing rapid product releases over safety measures in an attempt to outpace competitors.
An OpenAI spokesperson stated that the company is still testing Voice Engine with a select group of "trusted partners" to better understand its use cases and improve its safety and functionality.
“We are excited to see how this technology is being used across different fields—from speech therapy and language learning to customer support, video game characters, and AI avatars,” the spokesperson said.
What is Voice Engine?
Voice Engine, which powers OpenAI’s text-to-speech API and some voices in ChatGPT's voice mode, generates highly realistic speech that closely resembles the original speaker. However, its launch has been repeatedly delayed due to concerns surrounding security and ethical risks.
According to OpenAI’s June 2024 blog post, Voice Engine predicts the most likely-sounding voices based on text input while considering various accents, speech patterns, and tones. This allows it to not only generate spoken versions of text but also create "spoken-style responses" that simulate how different speakers would naturally read aloud.
Initially, OpenAI planned to integrate Voice Engine (previously called Custom Voices) into its API on March 7, 2024. The goal was to grant access to 100 trusted developers before a broader launch, prioritizing socially beneficial and responsible applications. OpenAI even trademarked the tool and set a pricing model at $15 per million characters for standard voices and $30 per million characters for HD-quality voices.
However, at the last minute, OpenAI postponed the rollout. A few weeks later, it showcased Voice Engine without providing a signup option, limiting access to around 10 developers who had been testing it since late 2023.
OpenAI’s Position on Ethical and Security Risks
In its March 2024 blog post, OpenAI emphasized that it wants to "initiate a conversation" about the responsible deployment of synthetic voices before making a decision on whether to release the technology at a larger scale.
Voice Engine has reportedly been in development since 2022 and was demonstrated to global policymakers in summer 2023 to highlight both its capabilities and risks. Some partners, including startups like Livox, have tested the tool for assistive communication solutions.
Livox CEO Carlos Pereira found the technology "incredibly impressive", particularly for disabled individuals who struggle with speech. However, due to the requirement of an internet connection, Livox could not integrate it into its products, as many of its users lack internet access.
"The voice quality and ability to generate speech in multiple languages is unique—especially for disabled users," Pereira said."We hope OpenAI will develop an offline version soon."
Pereira also noted that OpenAI has given no guidance on a potential launch date or whether it plans to charge for the service in the future.
Election-Related Risks and Deepfake Concerns
OpenAI hinted in June 2024 that one reason for the delayed release was the risk of misuse during the U.S. elections. After discussions with stakeholders, OpenAI introduced several safeguards, including watermarking AI-generated audio to trace its origin.
To prevent fraudulent voice cloning, OpenAI requires developers to obtain explicit consent from the original speaker and provide clear disclosures that the voice is AI-generated. However, the company has not clarified how it enforces these policies, which could be difficult to implement at scale, even with OpenAI’s resources.
The company also aims to develop "voice authentication" technology and create a "no-go" list to prevent the cloning of voices that closely resemble public figures. However, implementing these safeguards remains a technical challenge, and failure to do so could further damage OpenAI’s reputation, which has already been criticized for rushing product launches without adequate safety measures.
AI Voice Cloning: A Growing Threat
Voice cloning technology has rapidly evolved, but so have its risks. According to reports, AI voice cloning was the third fastest-growing scam in 2024, leading to an increase in fraud, financial scams, and deepfake-related misinformation.
Malicious actors have used voice cloning to generate fake audio clips of celebrities and politicians, spreading false information that goes viral on social media. These concerns have pushed regulators and tech companies to tighten security measures on AI-generated content.
The Uncertain Future of Voice Engine
OpenAI could release Voice Engine next week—or never at all. The company has repeatedly suggested that it might keep the service limited in scope due to ethical and safety concerns.
However, one thing is clear: whether due to regulatory concerns, safety risks, or both, Voice Engine remains OpenAI’s longest-running limited preview, with no confirmed launch date in sight.
0 Comments