You can set additional parameters in the tasks configuration file to improve the performance of speech-to-text.
If the audio data contains a lot of background noise or foreground music, you can enable speech detection to improve speech-to-text rates:
[frontend] module used by the speech-to-text task, set the DetectSpeech parameter to True to modify how the speech-to-text engine processes audio sections that are labeled as speech, which can improve recognition in these sections.[normalizer] module used by the speech-to-text task, set ZeroSilFrames to True. The speech-to-text engine skips over sections of audio that are identified as silence.If many of the words in the audio do not appear in the transcript, the language model might be too strongly weighted. In the language pack section of the configuration file, experiment with the following parameters:
LmScale parameter (the recommended range is between 0.2 and 2.0). LmOffset parameter (the recommended range is between -0.5 and +0.5).If the speech-to-text is producing many more words in the transcript file than are spoken in the audio, the language model might be too weakly weighted. In the language pack section of the tasks configuration file, experiment with the following parameters:
LmScale parameter (the recommended range is between 0.2 and 2.0). LmOffset parameter (the recommended range is between -0.5 and +0.5.You can also tune the following speech-to-text parameters to improve general speech-to-text performance :
Mode
ModeValue
For more information about these parameters, see the IDOL Speech Server Reference.
|
|