How I Cloned My Voice With AI

How I cloned by voice with AI




Voice cloning is one of the more controversial aspects of Artificial Intelligence, with a host of security and ethical concerns about its use. That said, I was curious about how well it would work, and if a machine could really accurately replicate the sound of my voice. 


1. How Does AI Voice Cloning Work?

I tested two different voice cloning services, Play.ht, and Descript. There are others, but these two stood out for ease of use. For both of them, I had to submit samples of my voice for the AI to ‘learn’. 
Play.ht has two options, ‘High Fidelity’ and ‘Instant’. High fidelity requires at least 30 minutes of recording, whereas instant only needs 30 seconds. I tried both.
Descript needs a minimum of 10, but recommends 30 minutes. 
Luckily, I have a lot of recorded content from training courses I developed, so it was as easy as just uploading the files. 
The processing time varied by platform. Play.ht needed 2 hours for high fidelity, and instant is, well, instant. Descript took almost 24 hours to develop the voice clone.

2. The Results


The results were definitely surprising. In the audio tracks below, three are the AI cloned voice, and one is my real voice as a reference. See if you can guess which voice is which from:


  • My real voice
  • Play.ht High fidelity
  • Play.ht Instant
  • Descript 

I asked a few different people, and the feedback was mixed. While a couple of the voices are obviously not quite right, one is relatively convincing.


3. The Summary


The progress of AI voice cloning is both amazing and disconcerting. While it brings numerous possibilities for positive applications, its potential misuse for nefarious purposes cannot be ignored.

It definitely has applications in automation (think content creation, sales processes, and training), but it’s still imperfect.


The Answers

This is which voice recording is which:

  1. Play.ht Instant
  2. Play.ht High fidelity
  3. Descript
  4. My real voice