Launched in May 2021, Veritone Voice is a lifelike text-to-speech (TTS) and speech-to-speech (STS) voice synthesis solution powered by its Enterprise AI platform, aiWARE™. Voice work is time-consuming and costly, and when creating audio for different audiences, content may lose its signature voice persona. Using proprietary and third-party AI engines, including Custom Neural Voice (CNV) from Azure Cognitive Services, Veritone is working with The Bert Show and iHeartMedia podcasts to use talent-approved synthetic voices to save time and scale their audio content to new audiences.
“We're letting AI do 80 to 90 percent of the heavy lifting and moving people from the loop to a supervisory role.”
Daniel Wong, Director of Marketing, Veritone
Voice synthesis
Veritone is a leading Enterprise AI platform that uses its software, services, and applications to help companies solve the current complexities of digital information. The company offers a voice cloning solution called Veritone Voice, which uses Azure’s custom neural TTS voice service to create high-quality, professional-sounding, synthetic voices that can be translated across different languages, dialects, accents, genders, and more.
Saving time and keeping a consistent persona
Voice work can be a tedious effort. Bert Weiss, the voice of The Bert Show, knows this all too well. His syndicated morning show reaches over one million listeners every week and appears on 27 different radio stations. Bert often reads similar scripts and liners for each station, which means less time for his other commitments. With line repetitions, recordings can lose the voice actor’s signature style and become forced.
Another challenge of voice work is trying to scale content to new audiences. When expanding audio content to different audiences and languages, companies need to use translation software or hire a fluent speaker. This meant losing the signature voice of the brand. That was the case with iHeartMedia as they looked to translate their English language podcasts to other languages.
The Veritone and Azure Partnership
A partnership between Veritone and Microsoft is far from unexpected. “Believe it or not, we've been a customer of Azure since the early days of indexer,” says Daniel Wong, Director of Marketing. While Veritone hosts on Azure and Azure for the US government, the partnership for Veritone Voice comes from the use of Custom Neural Voice through Azure Cognitive Services to create synthetic brand voices, including different languages.
With Azure CNV, Veritone can help its clients keep their voice persona consistent across languages and reach a diverse audience across the globe using cross-lingual adaptation. Additionally, human review can keep audio accurate, especially when needed for localization. Synthetic voices can be used across industries, including advertisements and endorsements, audiobooks, audio descriptions, training materials, radio productions, and more.
Creating a Human-Like Synthetic Voice
Custom Neural Voice is a text-to-speech feature on Azure used to create human-like AI voices with audio-based training data. Synthetic voices are created by using voice models trained with recording samples approved by the voice talent. From these synthetic voices, audio can be created in different styles. Microsoft offers Custom Neural Voice Pro (Professional) but also offers Custom Neural Voice Lite, a public preview that allows users to record a small set of pre-defined sentences into their laptop or computer instead of going through professional services, such as a recording studio.
A secure and supportive experience
With Veritone Voice, voice talent and companies can save the time and money used for additional recording sessions and hiring other talent for additional languages. “A voice actor can name every city over and over again, but it becomes very repetitive and very manual. So, you add AI and let the talent’s synthetic voice do the heavy lifting,” says Ashley Bailey, Director of Product Marketing, Synthetics and Metaverse, Veritone.
Another key value is Veritone’s commitment to clear, compliant, and consenting use and application of voice clone technology. Veritone prohibits the generation and learning of unauthorized voice models, provides controlled access to voice models, and enables audible and inaudible fingerprint verification.
Both written and verbal consent is crucial. Veritone works to uphold the best standards for global synthetic voice use as a member of the Interactive Advertising Bureau and Open Voice Network. “We choose who we work with very carefully, and synthetic voice in particular is an area where we are very hyper on getting consent,” says Bailey.
The voice of the future
With CNV, Veritone Voice will continue to offer ethical voice cloning services as a managed and self-service application. The Bert Show and iHeartMedia will serve as examples of how AI can streamline and improve the process of voice work.
Follow Microsoft