Member-only story
Audio to Text using Python
2 min readDec 27, 2024
import speech_recognition as sr
recognizer = sr.Recognizer()
audio_file = "audiobook.wav"
from pydub import AudioSegment
AudioSegment.from_mp3(audio_file).export(audio_file , format="wav")
with sr.AudioFile(audio_file) as source:
audio_data = recognizer.record(source)
try:
text = recognizer.recognize_google(audio_data)
print("Extracted Text:", text)
except sr.UnknownValueError:
print("Could not understand the audio.")
except sr.RequestError as e:
print("API Error:", e)
This code demonstrates how to convert an audio file (specifically an MP3) to WAV format using the pydub
library, then transcribe its speech content into text using the speech_recognition
library with Google Speech Recognition API. Here's a step-by-step explanation:
1. Importing Libraries
import speech_recognition as sr
from pydub import AudioSegment
speech_recognition
: Used for speech-to-text conversion.pydub
: A library for processing audio files, including format conversions.
2. Creating a Recognizer
recognizer = sr.Recognizer()
- Creates a
Recognizer
object, which is used for processing audio and converting it to text.