亼? 二? Ordered lists & CJK ideographs

Sinoglot is getting another facelift. More on that later.

One of the things that we’re going to great pains to ensure is cross-everything compatibility. Unless you use Opera. More on that later too.

Part of this cross-browser, cross-system, cross-whatever-else compatibility is making sure everything is HTML5, CSS3 compliant. This in turn has had me poring over standards references to find the goodies that would make it all work regardless of the device the person reading the posts (you) was using.

W3C, in a reference dated November 2002 and re-done in 2009, provides a few nice ways to sort numbered lists. These include “cjk-ideographic” (一 二 三 四 五…), “japanese-formal,” “-informal” and a few other names which end up being “壹 貳 參 肆 伍 陸 柒 捌 玖…”. There’s also cjk-earthly-branch (子 丑 寅 卯 辰 巳 午 未 申 酉 戌 亥) and cjk-heavenly-stem (甲 乙 丙 丁 戊 己 庚 辛 壬 癸), which are nice to have.

The one I don’t quite get is called “trad-chinese-formal”. It follows “cjk-ideographic” more or less, but then 二 is replaced by 亼. What the hell is 亼? Here’s the code from W3C for the simplified variant of the set:

Digit 1 一 U+4E00
Digit 2 亼 U+4EBC
Digit 3 三 U+4E09
Digit 4 四 U+56DB
Digit 5 五 U+4E94
Digit 6 六 U+516D
Digit 7 七 U+4E03
Digit 8 八 U+516B
Digit 9 九 U+4E5D

The character 亼, pronounced jí and given by Unihan as “to assemble, to gather together,” has no apparent business being there in place of 二. The 康熙字典 entry is below:

【子集中】【人字部】 亼; 康熙笔画:3; 页码:页91第03(点击查看原图)
【集韻】秦入切,音集。【說文】亼,三合也。从人一,象三合之形。讀若集。【徐鉉曰】此疑只象形,非从人一也。【正譌】亼,古集字。凡會合等字

12 responses to “亼? 二? Ordered lists & CJK ideographs”

  1. Bruce Rusk says:

    Typo. 二 is U+4E8C; 亼 is U+4EBC.

  2. Kellen Parker says:

    Thanks, but then I wonder how it went unnoticed.

  3. Syz says:

    Impressive, Bruce. Offline I’d mentioned to Kellen that someone would figure it out pretty quick, but I’ll admit I wasn’t thinking minutes. Nice.

  4. Jean says:

    What list did you use ? Do you have a link ? I found http://dev.w3.org/csswg/css3-lists/ but it is only a draft.

    In this page, it seems they messed up the meaning of simp/trad and formal/informal pretty badly, in addition to the above mentioned typo. Do you have an updated version of this ? Is it still a draft ? (which would explain that no one cares)

  5. Kellen Parker says:

    Yeah odd. I just tested it, and jí is er. It’s clearly just a typo on the reference. The one to which I linked it from 2002 but I know there is none from 2009 which I believe coo pied the typo. Not finding it just yet though. Bookmarked it on another system though so I’ll see if I can find it later.

  6. Kellen Parker says:

    I have to admit a part of me was really hoping for some archaic variant of 二 that I’d just never seen…

  7. I have to admit a part of me was really hoping for some archaic variant of 二 that I’d just never seen…

    That’s pretty much the only reason I read this post.

    oh, and Bruce Rusk is my hero.

  8. Kellen says:

    Hell it’s the only reason I wrote this post.

    And ditto on the hero thing.

  9. this page: http://old.nabble.com/-css3-lists–cjk-numbering-p21897075.html seems to imply that they got several things really, really wrong, including using 小寫 (‘informal’) where the should have used 大寫 (‘formal’), and using 亼 where they should have used 二 (scroll down to first mention of ‘trad-chinese-informal’).

    this page: http://forum.moztw.org/viewtopic.php?p=11737&highlight= mentions some more problems but is somewhat difficult to understand.

    the posts also mention that the algorithms used to produce bigger chinese and japanese numerals are not fully correct.

  10. one more: this page lists some big numbers for all you sinophiles; they sometimes appear in buddhist texts:


    List 3: Chinese digits beyond 10^12
    5th Group Marker (10^16) 京 U+4EAC
    6th Group Marker (10^20) 垓 U+5793
    7th Group Marker (10^24) 枾 U+67BE [1]
    8th Group Marker (10^28) 穰 U+7A70
    9th Group Marker (10^32) 溝 U+6E9D
    10th Group Marker (10^36) 澗 U+6F97
    11th Group Marker (10^40) 正 U+6B63
    12th Group Marker (10^44) 載 U+8F09
    13th Group Marker (10^48) 極 U+6975
    14th Group Marker (10^52) 恒河沙 U+6052 U+6CB3 U+6C99
    15th Group Marker (10^56) 阿僧祗 U+963F U+50E7 U+7957
    16th Group Marker (10^60) 那由他 U+90A3 U+7531 U+4ED6
    17th Group Marker (10^64) 不可思議 U+4E0D U+53EF U+601D U+8B70 [2]
    18th Group Marker (10^68) 無量大數 U+7121 U+91CF U+5927 U+6578

    not being aware of the likes of 不可思議 and 恒河沙 easily contributes deficient translations—but then, of course, one might surmise that they appear in the translations less for their numerical but more for their poetic value (so the intention is not so much to stress that ‘something can be found 10^52 times’, but rather that it is as ‘countless or numerous like the grains of sand in the river ganges’).

  11. Tab Atkins says:

    Heya, Tab Atkins here, current editor of the CSS3 Lists spec.

    I know that the CJK list definitions were pretty much completely borked; the algorithm was some strange mix of Chinese and Japanese rules, and so far every listing of characters that I’ve had someone look at has had at least one mistake.

    So, I’ve completely rewritten the entire CJK section, splitting out Chinese and Japanese and rewriting the character lists based on conversations with native speakers. You can find the current draft spec at http://dev.w3.org/csswg/css3-lists.

    The native Chinese speaker I could get my hands on immediately wasn’t familiar with Traditional Chinese characters, though, and so couldn’t help me with vetting those character lists. Can you confirm that the existing trad-chinese lists are correct (other than the typo noted in this post) and if not, could you provide the actual characters + codepoints that I should be using?

    Note that what the spec previously called “formal” is actually a “financial” style, optimized for the characters to be difficult to alter into each other.

  12. David Lloyd-Jones says:

    I’m reminded of the computer-science maxim “It’s good to have so many standards; that way you can choose the one you feel like using.”

    -dlj.

Leave a Reply to transliterationisms