Microsoft just made an AI voice generator so convincing it's too dangerous to release (2024)

Microsoft just made an AI voice generator so convincing it's too dangerous to release (1)

Speech and voice is clearly the next big battleground for generative AI and a number of companies are working hard to produce models that can understand and replicate natural voice patterns. And while the likes of ChatGPT Voice could change storytelling forever, Microsoft claims it's hit the apex of speech generation: human parity.

In fact, the company's researchers say their VALL-E 2 text-to-speech (TTS) generator is so advanced, it would be irresponsible and dangerous to release publicly. According to a research paper (spotted by our sister title, LiveScience) the generator needs just a few seconds of audio to reproduce a voice that's indistinguishable from a human.

To put that in perspective, the scientists at Microsoft believe the speech generated by VALL-E 2 matches or exceeds the quality of a human voice when compared to the audio samples from speech libraries LibriSpeech and VCTK.

"VALL-E 2 is the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time," the researchers wrote. "Moreover, VALL-E 2 consistently synthesizes high-quality speech, even for sentences that are traditionally challenging due to their complexity or repetitive phrases."

While the first generation model sounds stilted, there's no denying VALL-E 2 does an exceptional job of copying the resonance and articulation of the speaker.

Although the researchers aren't releasing the model publicly (more on that later), they have made several audio samples available to listen to in a blog post about the project. You can hear a speaker prompt sourced from LibriSpeech and then the resulting generation of an entirely new (complex) sentence from both the VALL-E and VALL-E 2 generators.

And while the first generation model sounds stilted, there's no denying VALL-E 2 does an exceptional job of copying the resonance and articulation of the speaker.

How does it work?

Microsoft just made an AI voice generator so convincing it's too dangerous to release (2)

Microsoft's VALL-E 2 TTS generator uses two specific features to achieve its impressive result: "Repetition Aware Sampling" and "Grouped Code Modeling."

Get the top Amazon Prime Day deals right in your inbox: Sign up now!

Receive the hottest deals and product recommendations alongside the biggest tech news from the Tom's Guide team straight to your inbox!

The first is designed to make the output sound more fluid by addressing performance issues around repetitions of small parts of words or phrases (known as tokens) that may trip up an AI — think of an alliteration-heavy sentence, for example.

The second feature also improves efficiency but does do by reducing the number of individual tokens the model processes in a single input sequence.

"VALL-E 2 surpasses previous zero-shot TTS systems in speech robustness, naturalness, and speaker similarity," the researchers wrote in the blog post. "VALL-E 2 can generate accurate, natural speech in the exact voice of the original speaker, comparable to human performance."

Too dangerous?

Microsoft just made an AI voice generator so convincing it's too dangerous to release (3)

Although Microsoft maintains there are uses for an AI speech generator capable of this level of output, such as producing speech for individuals with aphasia or people with amyotrophic lateral sclerosis, the company is keeping it research-only at present.

"Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public," the scientists wrote. This is in part due to the potential for misuse that could be encountered once the world at large was able to use it. In an ethics statement at the end of the post, the researchers wrote their creation, "may carry potential risks in the misuse of the model, such as spoofing voice identification or impersonating a specific speaker."

This isn't unique to Microsoft. OpenAI, creators of ChatGPT, have also placed restrictions on some of its voice tech and has created a deepfake detector as a means of helping users identify when images are created using AI. Whether or not VALL-E 2 (or its successor) stays closed off remains to be seen. The AI race will intensify over the coming months and years and companies and scientists will no doubt feel the pressure to push the envelope.

More from Tom's Guide

  • I just tried Runway’s new AI voiceover tool — and it’s way more natural sounding than I expected
  • Hume AI brings its creepy emotional AI chatbot to iPhone
  • ChatGPT Voice could change storytelling forever — new video shows it creating custom character voices

Category

Microsoft just made an AI voice generator so convincing it's too dangerous to release (4)

Microsoft just made an AI voice generator so convincing it's too dangerous to release (5)

Back to Gaming Laptops

Brand

Microsoft just made an AI voice generator so convincing it's too dangerous to release (6)

Processor

Microsoft just made an AI voice generator so convincing it's too dangerous to release (7)

Storage Size

Microsoft just made an AI voice generator so convincing it's too dangerous to release (8)

Screen Size

Microsoft just made an AI voice generator so convincing it's too dangerous to release (9)

Condition

Microsoft just made an AI voice generator so convincing it's too dangerous to release (11)

Price

Microsoft just made an AI voice generator so convincing it's too dangerous to release (12)

Any Price

Showing 10 of 24 deals

Filters

Microsoft just made an AI voice generator so convincing it's too dangerous to release (13)

(15.6-inch 512GB)

Our Review

2

Microsoft just made an AI voice generator so convincing it's too dangerous to release (16)

Microsoft just made an AI voice generator so convincing it's too dangerous to release (17)

(16GB RAM Black)

Our Review

3

Microsoft just made an AI voice generator so convincing it's too dangerous to release (18)

Microsoft just made an AI voice generator so convincing it's too dangerous to release (19)

(15.6-inch 512GB)

Our Review

4

Microsoft just made an AI voice generator so convincing it's too dangerous to release (20)

Microsoft just made an AI voice generator so convincing it's too dangerous to release (21)

(15.6-inch 512GB)

Our Review

6

Microsoft just made an AI voice generator so convincing it's too dangerous to release (24)

Microsoft just made an AI voice generator so convincing it's too dangerous to release (25)

(15.6-inch 512GB)

Our Review

8

Microsoft just made an AI voice generator so convincing it's too dangerous to release (28)

Microsoft just made an AI voice generator so convincing it's too dangerous to release (29)

Load more deals

Microsoft just made an AI voice generator so convincing it's too dangerous to release (34)

Jeff Parsons

UK Editor In Chief

Jeff is UK Editor-in-Chief for Tom’s Guide looking after the day-to-day output of the site’s British contingent. Rising early and heading straight for the coffee machine, Jeff loves nothing more than dialling into the zeitgeist of the day’s tech news.

A tech journalist for over a decade, he’s travelled the world testing any gadget he can get his hands on. Jeff has a keen interest in fitness and wearables as well as the latest tablets and laptops. A lapsed gamer, he fondly remembers the days when problems were solved by taking out the cartridge and blowing away the dust.

More about ai

OpenAI outlines plan for AGI — 5 steps to reach superintelligence5 ChatGPT tips to try for getting smarter answers to your prompts

Latest

Paolini vs Krejcikova live stream: How to watch Wimbledon 2024 Women's singles final online 2024
See more latest►

No comments yetComment from the forums

    Most Popular
    Microsoft just made an AI voice generator so convincing it's too dangerous to release
    NYT Strands today — hints, spangram and answers for game #131 (Friday, July 12 2024)
    Foldable phones have passed a very big milestone to going mainstream
    5 best shows like 'My Lady Jane' to stream after season 1
    Today's NYT Connections hints and answers — Friday, July 12, #397
    5 best dystopian shows on Netflix to stream right now
    Galaxy Watch 7 and Watch Ultra are losing one of best features in Samsung's smartwatches — but why?
    Pixel 9 tipped to get more expensive in latest leak
    Google announces four significant updates for Samsung's new devices
    Samsung Energy Score — here’s all the Galaxy Watches getting this AI feature
    iOS 18 adding new background sounds to help iPhone users destress
    Microsoft just made an AI voice generator so convincing it's too dangerous to release (2024)

    References

    Top Articles
    Latest Posts
    Article information

    Author: Nathanial Hackett

    Last Updated:

    Views: 5780

    Rating: 4.1 / 5 (52 voted)

    Reviews: 91% of readers found this page helpful

    Author information

    Name: Nathanial Hackett

    Birthday: 1997-10-09

    Address: Apt. 935 264 Abshire Canyon, South Nerissachester, NM 01800

    Phone: +9752624861224

    Job: Forward Technology Assistant

    Hobby: Listening to music, Shopping, Vacation, Baton twirling, Flower arranging, Blacksmithing, Do it yourself

    Introduction: My name is Nathanial Hackett, I am a lovely, curious, smiling, lively, thoughtful, courageous, lively person who loves writing and wants to share my knowledge and understanding with you.