- Published on
粵拼生成工具 Jyutping Convertor
Input Chinese text, then press the magnifying glass to generate the corresponding Jyutping annotations. Words in the results link to 粵典 Words.HK, a Cantonese-(zh/en) dictionary with pronunciation and sample sentences.
Loading...
Credits & Caveats
Credits
This page is a simple wrapper around Jackson L. Lee's pycantonese
python package, and should thus be credited as follows:
Jackson L Lee and Contributors (2014-2021). PyCantonse 3.3.1 https://github.com/jacksonllee/pycantonese
Caveats
Chinese characters are polyphonic in Cantonese, i.e., one character can correspond to multiple sounds depending on the context. PyCantonese
evaluates the context by ranking the frequency that a phrase appears in a corpus ("segmentation"). This works most of the time, as seen in the correct evaluation of 行 and 差 in the sample phrase.
However, there are cases in which the most frequently appearing phrase isn't the right way to parse a sentence. The phrase 再見到水精靈 is incorrectly parsed as 再見-到-水-精靈 (goodbye-arrives-water-spirit) when it should be 再-見到-水精靈 (again-observed-Water Pokemon). This illustrates two problems:
- "goodbye" 再見 is high frequency and is prioritized in the segmentation
- "water pokemon" 水精靈 was not present in the corpus
If you encounter issue (1), you can consider using markers to break up the phrase in a semantically meaningful way. For example, 再.見到.水精靈 will give (again-observed-...).
Issue (2) requires you to be able to provide the custom vocabulary, and that is feature for the future.