Back to Glossary
Automatic Speech Recognition (ASR)
Image/Audio/Video
Conversion of spoken language into text.
Automatic Speech Recognition (ASR) converts spoken audio into text using acoustic and linguistic models. Modern systems use deep learning architectures such as transformers and RNNs.
- Applications: Dictation, voice control, meeting transcription.
- Challenges: Dialects, background noise, multilinguality.