unicode - Which Japanese sorting / collation orders are supported by ICU / CLDR / UCA? -

August 15, 2015

the japanese language, believe, has more 1 sort order equivalent alphabetical order in english.

i believe there's @ least 1 based on pronunciation (i think kana have used 2 orders historically) , 1 based on radical + stroke count. chinese has multiple orders 1 based on radical/stroke due unicode han unification same character can have different stroke count chinese , japanese.

since believe standard sort order in unicode cldr data uca algorithm, , reference implementation icu.

implementations lag behind standards , information proving hard track down canonical sources.

if set collator language specifier ja, sort order should expect used?

if several available japanese, or planned available @ point, specifiers should used those? example specifier traditional alphabetical order of spanish es-u-co-trad.

the basic japanese sort order provided cldr (and therefore icu) based on sort order specified in jis x 4061-1996:

kana sorted gojuuon (五十音) order (with hiragana preceding katakana).
kanji sorted order in jis x 0208, "representative reading" (and following kana).

a ja-u-co-unihan collation available, includes rules sorting radicals stroke order (followed standard rules above). useful if sorting radicals.

if need more accurate sorting of kanji—for instance, reading of words used in—you need perform kind of morphological analysis dictionary figure out readings use, , apply unicode collation algorithm on those.

Search This Blog

UV code

unicode - Which Japanese sorting / collation orders are supported by ICU / CLDR / UCA? -

Comments

Post a Comment

Popular posts from this blog

jquery - How do you format the date used in the popover widget title of FullCalendar? -

Bubble Sort Manually a Linked List in Java -

asp.net mvc - SSO between MVCForum and Umbraco7 -