While Artificial Intelligence (AI) can do incredible things, it’s nowhere near being able to cost effectively take direction from the client, or nuance from the text and carry the intended meaning. Certainly, if it were agreed how text should be delivered, even within certain parameters, an AI programme could perform it adequately. However, this would be a specific piece of programming for that particular need. That costs big money. Even big budget feature films would think twice.
And that’s how these text to speech applications work. They’re set to analyse the whole text and using the AI learning, produce the most likely vocalisation. There are few, if any front-end settings for the user to change the meaning, intonations or delivery. You get what you get. It’s way better than the robotic sounding text to speech. It has a human sound, but it’s missing the human brain, experience, culture and heart. A human-sounding voice but not a human talking.