Japanese, Korean, Mandarin, and Taiwanese Mandarin languages do not separate words with whitespace. You must segment text in these languages into words before IDOL Speech Server can process them.
To segment text
Send an AddTask action to IDOL Speech Server, and set the following parameters:
Type
|
The task name. Set to SegmentText. |
Lang
|
The language pack to use. |
TxtFileIn
|
The text file to segment. |
TxtFileOut
|
The text file to write the segmented text to. |
Pgf
|
The pronunciation information file to use. |
To exempt a section of text from segmentation, move the section to a new line and add hash symbols (#) at the beginning and end of the section. You must also set the IgnoreHashLines parameter:
IgnoreHashLines
|
Set to True to exempt sections bounded by hash symbols from segmentation. |
For example:
http://localhost:13000/action=AddTask&Type=SegmentText&Lang=JAJP&TxtFileIn=C:/Data/Japanese.txt&TxtFileOut=JA_seg.txt&PgfFile=T:\LP\ENUK\ver-ENUK-5.0.pgf
This action uses port 13000 to instruct IDOL Speech Server, which is located on the local machine, to segment text in the Japanese.txt file and write the results to the JA_seg.txt file in the Temp directory.
|
|