The ClusterSpeech task clusters wide-band speech into speaker segments. For example, if two speaker clusters are identified, the output labels are Cluster_0 and Cluster_1 respectively.
| Parameter | Description | Required |
|---|---|---|
| Type | The task name. Set to ClusterSpeech. |
Yes |
| File | The input audio file. | |
| FixTime | A fixed size for speaker clusters. | |
| Lang | The name of a language pack. | Yes |
| MaxNumSpeakers | The final maximum number of speakers to produce. | |
| MergeThresh | The threshold below which to merge clusters. | |
| MinNumSpeakers | The final minimum number of speakers to produce. | |
| Out | The file that IDOL Speech Server writes task output to. | |
| SilThresh | The threshold between what the task identifies as silence and non-silence. | |
| SpeechThresh | The threshold between speech and non-speech (music or noise). | |
| SugdInputChannels | The channel layout of the input media file. | |
| SugdInputFrequency | The sampling rate of the input media file. |
http://localhost:15000/a=AddTask&Type=ClusterSpeech&File=wide.wav&lang=ENUK&out=outWide
This action uses port 15000 to instruct IDOL Speech Server, which is located on the local machine, to cluster the data in the wide.wav wide-band audio file into speaker segments, and to write the results to the outWide output file.
|
|