The next in line of the fundamentals of Data is the information in an audio format, sound, music, voice, or speech.
Comprehending sound is much more complex than understanding text and image. For instance, in speech recognition, there are different variety of human languages expressed in different variety of ways. People from different parts of the world speak the same language in their own individual styles and accents. In addition, there are thousands of different languages spoken across the globe. Applying the features of different accents and styles to each of these languages, we end up getting numerous different ways of communications around the world. This paves the way for complexity in recognizing speech compared to text and/or images, if we recall about text and images in our prequels of this blog series.
For example, let’s consider the following excerpt from an audio file in English language, Indian accent and the speaker’s style. Let us go through the following passage of the transcript.
“If you have the you know Killer Instinct in you, if you have the passion, if you have the belief, you can learn lot of new things; and you have to keep on learning, changing yourself because the world is evolving with chat GPT, AI, Bitcoin, so many new technologies coming into the picture, where you hear that some sort of jobs will be lost. One thing cannot be lost is the human interaction, right? Definitely, the passion to do something, the killer instinct, which a human can only bring to the table be for any organization. So, if you’re building yourself, you know, if you’re always work in progress, you’re learning, improving, and growing; yeah I will say that Iconxt is something which can always… you know, icon next… you can always be the next icon that exactly reading light for this industry, and I wish you all,… all the best and I’ll make sure that whatever you do if I can help you in any sort of way, I’ll always be there to help you,… wish Iconxt, the best of luck.”
Now, let us listen to the podcast of the above transcript.
It is undoubtedly clear that the audio recognition needs some orientation to it and/or training for a particular language, style, and accent.
Listen to the following podcast. It is in a different Indian language, Tamil, with the speaker’s own accent and style, the meaning of which would be understood only by people who understands the language and the accent.
“I should thank you and the Iconxt Interactive Team. I feel privileged when you chose me and asked for my interview. Today, even if at least 10 people watch this video, it would be helpful to them. I follow and watch all of your inspirational stories / episodes. You give lots of insights and stories for startups from a business angle. Furthermore, I wish and bless that your YouTube channel, Iconxt Stories, get more and more subscribers.”
I bet that without the above translation most of the audience, who does not know the language, would not have understood this podcast 2.
Likewise, there are a wide range of variety of feelings / emotions / thoughts / ideas / concepts, whatever you call it, whenever you hear or listen different kinds or types of sounds like the sounds of a breeze, dry leaves, water from the seas and oceans, rain, thunder, baby cries, chirping of birds, roar of animals, human murmur / speech, and what not, apart from formal music from instruments and human vocals.
– Srilakshmi K, MSc
Co-Founder & CEO