Speech-to-text technology has introduced the idea of a hands-free future in which a mouse and keyboard are obsolete features of a computer. Reliance on hardware altogether becomes diminished as voice activation grows. Consider technologies such as the Google Home or Amazon Alexa. These devices have made everything from turning on a light to ordering take out hands-free.
Improvements in software technology, however, are not limited to the companies that create them. Businesses of all backgrounds can take advantage of the ease and accessibility of such technologies as well. In fact, businesses may need to in order to keep up with consumer demands. According to Advantage Digital Technology, annual sales for voice technology reached over 2 billion dollars in the United States and the United Kingdom in 2019. With increasing numbers of people utilizing voice assistants to complete everyday tasks, businesses benefit from integrated e-commerce software that is compatible with speech-to-text technology.
What is speech-to-text technology?
Speech-to-text technology allows compatible devices to convert spoken input into text or commands. The development of this technology has grown tremendously since its conception in 1877 when Thomas Edison created the phonograph — the first device to record and reproduce sound. Since then, various voice-operated devices have been made — including the Audrey, the Shoebox, the Harpy, and the Tangora — each of which provided a new breakthrough in speech recognition. Today we recognize Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana, and Google’s Google Assistant as common household names. These devices are shaping speech-to-text technology as we know it.
How speech recognition works
Speech recognition begins by using a microphone to send soundwaves into a computer. Those soundwaves are then digitized by a speech-to-text software program. In other words, the frequencies — or differences in pitch — in the audio are broken down into individual units of sound.
From there each unit of sound is assigned a phoneme — the smallest unit of language. The English language has about 44 phonemes. Think of a phoneme as the distinction between the sounds [s] and [z] in the words “bus” and “buzz,” respectively. The best software program can understand the difference between the slightest change of sound in a given language.
Lastly, the program contextualizes each phoneme so that it makes sense in the sentence or phrase that was spoken. Consider two words that sound the same, such as “ate” and “eight.” By running through its library of known words and phrases, the program can determine which is appropriate given the context of the sentence.
Voice commerce
Voice commerce allows consumers to buy a product or service by using voice commands. The concept is a function of many Speech-to-Text compatible devices including Amazon’s Alexa, Apple’s Siri, and Microsoft’s Cortana. By enabling a way to conduct business without the reliance on periphery hardware, such as a mouse and keyboard, these devices have opened up an opportunity for businesses to increase their visibility, accessibility, and convenience.
Voice commerce advantages
For businesses employing order management systems, introducing voice commerce is simple. Essentially, products bought on any device, in any manner can be processed and cataloged uniformly. This software opens up a plethora of advantages for businesses looking to invest in voice commerce. Below are a few of those advantages:
- Convenience – Since humans speak considerably faster than they type, voice commerce is quick, easy, and convenient. Voice commands can be given when cooking, cleaning, or even driving.
- Customer retention – Data collected from previous voice commands can be used to personalize and improve upon a user’s experience.
- Advertising – Businesses that have already taken advantage of Speech-to-Text technologies are doing so in ways that boost their advertising. Tide laundry detergent, for example, has created a skill on Amazon’s Alexa that features a comprehensive list of tips for how to get stains out of clothing. Many of those tips include using a Tide product.
- Assistive technology – More than a matter of convenience, voice-to-text assistive technology is a helpful tool for people with physical, cognitive, sensory, and learning impairments. By replacing the traditional mouse and keyboard, voice-to-text reaches out to an audience commonly left behind by new technology. A person with a visual impairment could use the technology to buy products online that they previously couldn’t search for. A person with dyslexia could use the technology to write emails to businesses, a manner that might be more accessible to them.
Voice commerce challenges
All the aforementioned advantages of voice commerce can appear narrow considering the technology’s widespread lack of availability coupled with its potential for privacy invasion. Below are two major challenges to voice commerce:
- Language barriers – What linguistics refers to as “broadcast English” constitutes the only continuously recognized dialect when it comes to speech-to-text compatible devices. Variances in dialect, accent, or phrasing can pose a significant challenge. Some accents go entirely unrecognized. Moreover, languages other than English are rarely supported by speech-to-text compatible devices. Amazon’s Alexa, for instance, supports only two languages outside of English, those being German and Japanese.
- Privacy – Research into Apple’s voice assistant technology was largely reduced after an internal whistleblower reported to The Guardian that they regularly overheard private conversions while working for the company. Contracted to grade the accuracy of the voice assistants, these workers overhead much more than intended — leading to concerns over privacy.
Interested in learning more about voice commerce? Watch the interview with a voice commerce expert for more insights.