Everyday Speech: Examples of TTS out in the wild

3 text-to-speech examples I’ve randomly encountered online

elaineinthebay
3 min readNov 11, 2022
girl speaking into her phone. the phone screen is open to an app that is recording the girl’s speech and displaying it in a waveform format. on top of the background image, the main text reads, “Everyday Speech”
background photo courtesy of BandLab on Unsplash

Long before the rise of Bev Standing’s iconic text-to-speech voice all over TikTok and the internet, we’ve heard computers talk. Most people in this day and age have experienced the phenomenon of synthetic speech and its eerie non-human-ness. But what exactly is synthetic speech and why do we keep using it?

Voice branding expert Phoebe Ohayon defines speech as: the signal produced by modulating voice into meaningful patterns. Although many people use “speech” interchangeably with the term “voice”, speech is not necessarily always produced by humans. In fact, that’s exactly what synthetic speech refers to: the artificial production of human speech, a.k.a. machine-created speech. As highly communicative creatures, humans are pretty good at parsing if something is natural or artificial speech. A lot of synthetic speech systems have wonky word emphasis or pauses at the “wrong” time, among other factors that reveal their “unhuman” nature.

The wonkiness explained

Text-to-speech (TTS) is a kind of process that creates “spoken” content from written text and can be thought of as “read aloud” technology. It’s live output built upon pre-made input. Generally, the birth of a TTS application happens in a recording studio. Voice actors are hired so that the software can capture all possible sounds (not words) in a language, which are later “stitched together” for any combination of words (i.e. the words and sentences that were not recorded). This video from Acapela Group does a great job in showing how the word “impressive” can be created by stitching together parts of the words: “impossible”, “president”, and “detective”.

However, not all TTS software are created equally, with some less natural-sounding than others. The speech might sound flat (lack of intonation) or punctuation might get ignored. So the question remains: if the technology sounds so bad, why do we keep relying on synthetic speech?

The authors of the 2005 book, Wired for Speech, summarized it best:

“Because of limitations of storage space (digital recordings are large), processing speed (finding and combining arbitrary utterances can be slow), bandwidth speed (sound files do not transmit gracefully over a 33 kilobyte phone line), dynamism of content (all of the Web’s content cannot be spoken and recorded in real time), and other technical constraints, much of the speech that is and will be produced by computers, the Web, telephone interfaces, and wireless devices will be ‘synthesized speech’[.]”

Simply put, it’s a lot easier and viable to create speech artificially rather than have interfaces present “fully recorded words and phrases”, as Clifford Nass and Scott Brave stated in their book. It would be expensive both in terms of money and computing power to do so.

Examples of TTS and its modern usage

Personally, I’ve loved to see this kind of speech technology evolve and improve over time— and become more predominant in everyday life. As someone particularly fond of voice technology, it’s been super fun to follow the modern online trend of creating short videos with synthetic speech content. The following examples listed below are a few of my personal favorite use cases for TTS that are not Instagram Reels/TikToks.

TTS to open a music video

BLOSWOM, a music artist from France, released a music video for his song “Rosiana” where a TTS voice sets context to the scene and reveals why this character wakes up on the beach.

TTS for comedic effect in a video essay

In the video commentary on the 2022 Andrew Dominik film “Blonde”, the Be Kind Rewind channel points out there are potentially many inaccuracies to look out for in the film adaptation of Marilyn Monroe’s life— one of which is a parody on the film’s use of a talking fetus.

TTS to replace human commentary for product reviews

This was an interesting find: a tech-oriented product review channel that only uses a TTS voice to provide review commentary. While there are many reasons someone might choose to omit recording their own voice for a video (including speech impediments, insecurity around accent, etc.), it was nice to see a video trying to normalize its use.

Got any favorite examples of synthetic speech in your life? Let me know by commenting on this post!

--

--