Unleashing the Power of Voice: A Guide to JavaScript Speech Recognition and Synthesis APIs

Published by

Umesh Chandra

on

December 6, 2023

In the ever-evolving landscape of web development, creating engaging and interactive user experiences is a constant challenge. One exciting avenue that developers are exploring is the integration of speech recognition and synthesis capabilities into websites using JavaScript. Imagine a website that not only responds to user clicks but also listens and speaks—opening up a new realm of possibilities for accessibility and user engagement.

Getting Started with Speech Recognition

What is Speech Recognition?

Speech recognition is the technology that enables a computer to identify and interpret spoken language. With JavaScript’s Speech Recognition API, developers can now tap into this technology directly from the browser.

To initiate speech recognition, see the sample code snippet below:

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const rec = new SpeechRecognition();
rec.lang = 'en-US';
rec.continuous = true;
rec.onresult = function (e) {
    const script = e.results[i][0].transcript.toLowerCase().trim();
    console.log(script)
}
rec.start();

Now, let’s break down each part of the code:

1. SpeechRecognition API Check:

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

This line checks if the browser supports the standard SpeechRecognition API or the older webkitSpeechRecognition API (used in some WebKit browsers like Safari). It assigns the appropriate API to the SpeechRecognition variable.

2. Creating an Instance of SpeechRecognition:

const rec = new SpeechRecognition();

This line creates a new instance of the SpeechRecognition object. This object represents the speech recognition service.

3. Setting Language for Speech Recognition:

rec.lang = 'en-US';

This line sets the language for speech recognition. In this case, it’s set to English (United States). we can adjust this value based on the language we want.

4. Enabling Continuous Listening:

rec.continuous = true;

By setting continuous to true, the script allows the speech recognition to continue listening indefinitely, capturing multiple speech segments. we can set that to false if we don’t want it to listen continuosly.

5. Speech Recognition Callback:

rec.onresult = function (e) {
    const script = e.results[i][0].transcript.toLowerCase().trim();
    console.log(script)
}

This block sets up a callback function (onresult) that gets triggered when speech is recognized. Inside the callback, it retrieves the recognized speech from the SpeechRecognitionEvent and logs it to the console after converting it to lowercase and removing leading/trailing whitespaces.

6. Starting Speech Recognition:

rec.start();

This will start the speech recognition process. Once started, the onresult callback will be invoked whenever speech is detected.

7. Stopping Speech Recognition

To do this we can use the following code snippet

rec.stop();

Giving Your Website a Voice with Speech Synthesis

What is Speech Synthesis?

Speech synthesis, on the other hand, is the technology that converts text into spoken words. With JavaScript’s Speech Synthesis API, you can make your website speak to users, providing a more immersive and accessible experience.

we can initiate this process by the following code snippet:

const textToSpeak = new SpeechSynthesisUtterance("Hello, welcome to our website!");
speechSynthesis.speak(textToSpeak);

This code creates a SpeechSynthesisUtterance with the desired text and then uses speechSynthesis.speak() to vocalize the message. We can enhance this by dynamically generating speech from user interactions or content on the website.

Building Interactive Web Experiences

Now that you have a grasp of both Speech Recognition and Speech Synthesis APIs, let’s explore how you can combine them to create truly interactive web experiences.

Imagine a chat application where users can speak to input messages, and the website responds by reading out the replies. This not only adds a cool factor to your site but also enhances accessibility for users with visual impairments.

// Speech recognition setup (assuming recognition is already defined)
recognition.onresult = (event) => {
  const transcript = event.results[0][0].transcript;
  sendMessage(transcript);
};

// Speech synthesis setup (assuming textToSpeak is already defined)
function sendMessage(message) {
  // Process the message and perform actions as needed
  textToSpeak.text = `You said: ${message}`;
  speechSynthesis.speak(textToSpeak);
}

By integrating both APIs, we can create a seamless flow where users speak their messages, the website recognizes and processes them, and then responds with spoken feedback.

JavaScript’s Speech Recognition and Speech Synthesis APIs open up exciting possibilities for creating more interactive and accessible web applications. Whether you’re building a voice-controlled interface or enhancing the user experience with spoken feedback, these APIs provide a powerful toolset to experiment with.

Start experimenting with speech recognition and synthesis in your projects, and let your website find its voice in the vast landscape of the internet!