Published on

粵拼生成工具 Jyutping Convertor

請輸入字串,然後按放大鏡一下。程式會幫你將佢「翻譯」成為粵拼。

Input Chinese text, then press the magnifying glass to generate the corresponding Jyutping annotations. Words in the results link to 粵典 Words.HK, a Cantonese-(zh/en) dictionary with pronunciation and sample sentences.


Loading...

Credits & Caveats

Credits

This page is a simple wrapper around Jackson L. Lee's pycantonese python package, and should thus be credited as follows:

Citations

Caveats

Chinese characters are polyphonic in Cantonese, i.e., one character can correspond to multiple sounds depending on the context. PyCantonese evaluates the context by ranking the frequency that a phrase appears in a corpus ("segmentation"). This works most of the time, as seen in the correct evaluation of 行 and 差 in the sample phrase.

However, there are cases in which the most frequently appearing phrase isn't the right way to parse a sentence. The phrase 再見到水精靈 is incorrectly parsed as 再見-到-水-精靈 (goodbye-arrives-water-spirit) when it should be 再-見到-水精靈 (again-observed-Water Pokemon). This illustrates two problems:

  1. "goodbye" 再見 is high frequency and is prioritized in the segmentation
  2. "water pokemon" 水精靈 was not present in the corpus

If you encounter issue (1), you can consider using markers to break up the phrase in a semantically meaningful way. For example, 再.見到.水精靈 will give (again-observed-...).

Issue (2) requires you to be able to provide the custom vocabulary, and that is feature for the future.