Which approach will maximize transcription accuracy during the development phase?

A company wants to use automatic speech recognition (ASR) to transcribe messages that are less than 60 seconds long from a voicemail-style application.The company requires the correct identification of 200 unique product names, some of which have unique spellings or pronunciations.The company has 4,000 words of Amazon SageMaker Ground Truth voicemail transcripts it can use to customize the chosen ASR model.The company needs to ensure that everyone can update their customizations multiple times each hour.

Which approach will maximize transcription accuracy during the development phase?

Use a voice-driven Amazon Lex bot to perform the ASR customization. Create customer slots within the bot that specifically identify each of the required product names. Use the Amazon Lex synonym mechanism to provide additional variations of each product name as mis-transcriptions are identified in development.

Use Amazon Transcribe to perform the ASR customization. Analyze the word confidence scores in the transcript, and automatically create or update a custom vocabulary file with any word that has a confidence score below an acceptable threshold value. Use this updated custom vocabulary file in all future transcription tasks.

Create a custom vocabulary file containing each product name with phonetic pronunciations, and use it with Amazon Transcribe to perform the ASR customization. Analyze the transcripts and manually update the custom vocabulary file to include updated or additional entries for those names that are not being correctly identified.

Use the audio transcripts to create a training dataset and build an Amazon Transcribe custom language model. Analyze the transcripts and update the training dataset with a manually corrected version of transcripts where product names are not being transcribed correctly. Create an updated custom language model.

Explanations:

Amazon Lex is a conversational AI service, not designed for speech recognition tasks like ASR. It cannot directly perform ASR customization for product name recognition.

While Amazon Transcribe can be used to customize ASR models, the approach of automatically updating custom vocabulary based on confidence scores might not be optimal for frequent, precise customizations, especially with unique pronunciations or spellings.

Amazon Transcribe allows the use of a custom vocabulary to improve transcription accuracy. By manually updating the vocabulary with phonetic pronunciations for the product names, accuracy can be improved and updated frequently.

Building a custom language model using Amazon Transcribe requires more extensive training data and time, which doesn’t align with the requirement to update customizations multiple times per hour. This option is more suitable for large-scale model training rather than rapid updates.

Learn & move to cloud

Which approach will maximize transcription accuracy during the development phase?

Explanations:

Leave a Reply Cancel reply