To perform speech-to-text conversion on stereo audio input data, each channel can be processed separately. For example:
[StereoWavToText] 0 = l,r ← wav1(STEREO, input) 1 = f1 ← frontend(_, a:l) 2 = nf1 ← normalizer(_, f1) 3 = w1 ← stt(_, nf1) 4 = output ← wout1(_, w1) 5 = f2 ← frontend1(_, a:r) 6 = nf2 ← normalizer1(_, f2) 7 = w2 ← stt1(_, nf2) 8 = output ← wout2(_, w2)
0 |
The wav module processes the input stereo audio file as left and right audio data. |
1 |
The frontend module converts left audio channel (l) into speech front-end frame data. In this step, the variable form a:l represents the change of name for the left channel audio data (type l) to audio data (type a). |
2 |
The normalizer module normalizes the frame data from 1 (f1). |
3 |
The stt module converts the normalized frame data from 2 (nf1) into text. |
4 |
The wout1 module writes the recognized words resulting from 3 (w1) to the output file. |
5 |
The frontend module converts right audio channel (r) into speech front-end frame data. In this step, the variable form a:r represents the change of name for the right channel audio data (type r) to audio data (type a). |
6 |
The normalizer module normalizes frame data from 5 (f2). |
7 |
The stt1 module converts the normalized frame data from 6 (nf2) into text. |
8 |
The wout2 module writes the recognized words resulting from 7 (w2) to the output file. |
|
|