• Category
  • >Python Programming

Python Essentials for Voice Control: Part 1

  • Priyanshu Gupta
  • Feb 23, 2021
Python Essentials for Voice Control: Part 1 title banner

In our previous blogs on Python, we have discussed some basics from zero levels such as the First Step towards Python. In this blog, we will be advancing our level by focusing on some projects and real-life applications.


Let’s dive into applied Python!


What is Voice Control?


Voice Control means control actions using meaningful voice. When we control machines using some sound which is understandable by our machine. Machines already familiar with that particular sound stored in the machine's storage.  


For example, In your childhood, you all definitely have played with toys which act with your Clapping or other special sounds. Here also simple sound control techniques are used with the help of electronic devices.


Humans also work on Voice commands. When your mom commands you to purchase vegetables from the market you pay heed to the command and then process the actions. If your mind knows the address of the market & you recognize vegetables then you can complete the task as per the order. 


As a result, Voice control is very near to the living beings also.


You can also take a look at our blog on Python to represent output.



Hands-On with Voice Control Libraries


Speech Recognition means an understanding of language. The language is a medium between two entities. speech_recognition library of python contains several APIs to complete this purpose. Some APIs can be operated online as well as offline. 


pip install speech_recognition


Install  speech_recognition library using CMD.



Challenge 1: How can machines understand our voice commands?


Recognizer( ) :


It is a class of speech_recognition library. It contains methods that help us to recognize the voice of a particular format.


  •  APIs :


Recognizer( ) class contains the methods for speech recognition APIs.

APIs available for Speech Recognition :


recognize_bing( ) , recognize_google( ) , recognize_google_cloud( ) , recognize_houndify( ) , recognize_ibm( ) , recognize_sphinx( ) , recognize_wit( ) 


  • recognize_sphinx( ) : 

It is used to call an Sphinx API for speech recognition in a particular language. 


recognize_sphinx(audio_data , language=”en-US” , keyword_entries=None , grammar = Note , show_all = False)


  1. audio_data can be in form of a source audio file or in an online microphone audio transmission format. Audio Source file is like a script while Microphone is a real-time user input source.   

  2. The language of audio can be passed using language parameters. 


Example:  en-US (United State based English )

       en-IN (India based English)

import speech_recognition as sr

hear = sr.Recognizer()
audio = hear.recognize_sphinx()

Note: The speech recognition Library contains a total of 7 API’s. Only Sphinx is able to perform programs in Offline Mode, the rest all need internet connectivity.


Recommended Blog: Data Types in Python


  • record( ) : 


It is a function of a class Recognizer. It is used to capture data from an audio file.             

record(source , duration = None , offset = None)


Here, the Parameter source transcribes data from an offset (in seconds) for a specific duration (in seconds) into an Audio Data instance. 


  1. Here, the source is the location that contains commands in audio format. 
  2. If the duration is not specified then it will transcribe until input audio data is finished.
  3. If offset is not specified then it records from zero-second otherwise it will start transcribing from specified offset time.


dhwani = sr.AudioFile('dhwani.wav')
with dhwani as source:
    sound = hear.record(source)


  • AudioFile( ) :

It is a class of Speech_Rcognition Library which is used to select the location of a source file.  



  1. If the filename_or_fileobject is a string then it is the path to location of the source file from which the command will be picked.


Note:  speech_recognition library only supports 4 types of  formats of audio files :  


WAV (PCM/LPCM format) , AIFF , AIFF-C , FLAC (Native)


  • adjust_for_ambient_noise( ) : 

It is a method of the Recognizer class. Using this method you can adjust the energy threshold dynamically using audio from the source to get an ambient noise. It works as a filter for speech recognition.


adjust_for_ambient_noise(source , duration =1)


Here, the source parameter contains the command in audio format and duration parameter used to adjust Time-Frame that adjust for ambient noise. By default, the value of it is 1.

noise = sr.AudioFile('noise.wav')   
with noise as source:
    sound = hear.record(source)

Challenge 2: Who helps us to listen to voice commands?


PyAudio library of python is used for interfacing with audio drivers. We need to install it because the speech_recognition module also depends on it. 

pip install pyaudio

Install pyaudio library using cmd.


Note:  If you got any error and you are using Anaconda, then install pyaudio library using conda prompt. It should solve your problem.


conda  install pyaudio


Microphone( ) : 


It is a class of speech_recognition Library. It is used as a medium between user & machine to communicate with programs.  Now, instead of Audio Files as source (Offline Method) , we will use the Machine’s Microphone as source (Online Method) to pass the commands.

Microphone(device_index=None , sample_rate=None , chunk_size=1024)


1. Here, if you do not specify any device_index then it will use the default microphone driver, otherwise you can specify the index of the driver name from list_microphone_names( ) output.

2. Higher sample_rate value gives better audio quality but more bandwidth consumed and recognition will get slow. If not specified, it will pick a value from the default microphone settings.

3. chunk_size helps to avoid continuous trigger on rapidly changed ambient noise, but also makes detection less sensitive. The default value of chunk_size is 1024.


Create an instance of the Microphone( ) class.

import speech_recognition as sr
mic = sr.Microphone( ) 


  • list_microphone_names( ) :  

It is a method of the Microphone class to see the list of available microphone drivers.

Check that any microphone driver is available or not.


['Microsoft Sound Mapper - Input', 'Microphone (Realtek(R) Audio)', 'Microsoft Sound Mapper - Output', 'Speakers (Realtek(R) Audio)', 'Speakers (Nahimic mirroring Wave Speaker)', 'Microphone (Realtek HD Audio Mic input)', 'Speakers (Realtek HD Audio output)', 'Headphones (Realtek HD Audio 2nd output)', 'Stereo Mix (Realtek HD Audio Stereo input)', 'Headset (@System32\\drivers\\bthhfenum.sys,#2;%1 Hands-Free AG Audio%0\r\n;(ZQHP-4155))', 'Headset (@System32\\drivers\\bthhfenum.sys,#2;%1 Hands-Free AG Audio%0\r\n;(ZQHP-4155))', 'Headphones ()']


The output is a type of list data type. So, we can use the indexing feature of the list for accessing any particular driver.

mic = sr.Microphone(device_index=3)


  • listen( ) :  

It is a method of recognizer class. It is used to capture voice input from users.

listen(source , timeout = None , phrase_time_limit = None , snowboy_ocnfiguration = None)


  1. Here, the source may be Audio File or Microphone as input commands.

  2. timeout (in seconds) parameter is the maximum time for which the program will wait for a command before quitting , otherwise it will throw an error speech_recognition.WaitTimeoutError exception. The default timeout value is zero.

  3. phrase_time_limit (in seconds) parameter is the maximum wait time for which a program can wait for a user to continue a phrase or command. Default phrase_time_limit value is zero.

  4. snowboy_configuration parameter allows integration with Snowboy Engine, which is an offline, High Accuracy, Power-efficient hotword recognition engine. It is used to detect a Hotword, which is a signal for pause & unpause in listening.


mic = sr.Microphone()
with mic as source:
    sound = hear.listen(source)


  • adjust_for_ambient_noise( ): 


Now, try to apply it with a microphone, while we have already applied it with Audio Files. 

with mic as source:
    audio = hear.listen(source)


Now, We are able to understand the flow of Voice commands very well.  Try to use these concepts in your projects.




Python programs create wonderful applications in every possible domain. Speech Recognition is also a very attractive form to create interactive programs with the user. It is like Active Communication with machines.


Speech Recognition can be a healthier way for Blind peoples to operate Internet Of Things (IoT) devices, objects of Artificial Intelligence. They can understand actions in their surroundings just by listening and also interacting with them.


Developers should always try to solve community problems. It helps the people for easy & luxury living. Speech Recognition is widely used on the Internet Of Vehicles (IOV).

Tesla’s automated cars are a real-world example of Voice Control. JARVIS generates the thought to communicate & operate machines with the voice of Iron Man. Your dreams can become true if you work with these types of python stuff.

Latest Comments