The AudioAnalysis task runs all the audio preprocessing tasks that are supported by the audiopreproc module in a single task.
For more information, see [audiopreproc] Module Configuration.
| Parameter | Description | Required |
|---|---|---|
| CtmFile | The speech-to-text transcript produced for the audio file. | Yes |
| File | The audio file to process. | Yes |
| Out | The XML file to write the audio analysis results to. | Yes |
| Sfreq | The sample frequency of the audio file to process. | |
| SugdInputChannels | The channel layout of the input media file. | |
| SugdInputFrequency | The sampling rate of the input media file. |
http://localhost:13000/action=AddTask&Type=AudioAnalysis&File=C:\data\Sample.wav&Out=SampleAnalysis.xml
This action uses port 13000 to instruct Speech Server, which is located on the local machine, to perform audio analysis on the Sample.wav file and to write the results to the SampleAnalysis.xml file.
The AudioAnalysis log file provides information on several audio quality assessments. For example:
<autnresponse>
<audiopreproc>
<snr>
<mean>20</mean>
<audio_level>66</audio_level>
</snr>
<gain>
<size>35</size>
<energy>69</energy>
</gain>
<max_gain_difference>0</max_gain_difference>
<clipping>
<assessment>no</assessment>
<percent_frames>0</percent_frames>
</clipping>
<categories>
<speech_percent>77.3667</speech_percent>
<silence_percent>7.45</silence_percent>
<noise_music_percent>15.9</noise_music_percent>
</categories>
</audiopreproc>
<resultDeleted>False</resultDeleted>
</autnresponse>
The log file includes information on the following:
The gain level, and the actual energy level. The log file also includes a summary of the maximum difference in decibels between speaker levels across the whole file (<max_gain_difference>). For a good quality waveform where the two speakers speak at a similar gain level, this number can be zero (or at least very low).
An assessment of the amount of clipping in the file, and the number of frames affected. The <assessment> field can hold one of the following values:
|
|
no clipping |
insignificant
|
<= 0.1% of frames |
minor
|
<= 1% of frames |
moderate
|
<= 4% of frames |
heavy
|
> 4% of frames |
You can use the GetResults action to retrieve this information; you do not need to specify a result label.
The AudioAnalysis task also produces an additional audio classification .ctm file. By default, this has the same name as the task token. You can use the GetResults action with the label parameter set to class to retrieve this file.
|
|