unicode - Which Japanese sorting / collation orders are supported by ICU / CLDR / UCA? -


the japanese language, believe, has more 1 sort order equivalent alphabetical order in english.

i believe there's @ least 1 based on pronunciation (i think kana have used 2 orders historically) , 1 based on radical + stroke count. chinese has multiple orders 1 based on radical/stroke due unicode han unification same character can have different stroke count chinese , japanese.

since believe standard sort order in unicode cldr data uca algorithm, , reference implementation icu.

implementations lag behind standards , information proving hard track down canonical sources.

if set collator language specifier ja, sort order should expect used?

if several available japanese, or planned available @ point, specifiers should used those? example specifier traditional alphabetical order of spanish es-u-co-trad.

the basic japanese sort order provided cldr (and therefore icu) based on sort order specified in jis x 4061-1996:

  • kana sorted gojuuon (五十音) order (with hiragana preceding katakana).
  • kanji sorted order in jis x 0208, "representative reading" (and following kana).

a ja-u-co-unihan collation available, includes rules sorting radicals stroke order (followed standard rules above). useful if sorting radicals.

if need more accurate sorting of kanji—for instance, reading of words used in—you need perform kind of morphological analysis dictionary figure out readings use, , apply unicode collation algorithm on those.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -