Modern Character Creation

This is part two of a series. Part one is available here.

In my last post I went over a number of contractions for which characters have been encoded into Unicode as well as discussing some of the inherent limitations in encoding methods. In this post, I’ll be explaining why this matters.

The technical limitations imposed by having a finite number of glyphs aren’t something that affects only the writing of topolects. It’s also an issue for the development of modern Mandarin as the creation of new terminology for new concepts is likely to be severely limited in the future by the lack of flexibility in the written language. Of course, this is a limitation that we may never be aware of in future years, much like one cannot truly be aware of other paths one’s life may have taken. And granted, there are thousands upon thousands of characters that we’ll never see outside of character tables or the 康熙字典, but it’s still a major limitation as far as being Mandarin is a living language.

It wasn’t really until computers became more widely used that the scope of possible characters became as limited as it is now. See the much publicised case of Mǎ Chéng 馬

  1. Gary Feng says:

    If one were to do a headcount of “official” characters created in the recent decades, I’d bet that a lot of them are names of the basic elements and particularly organic chemistry terms. For much of the first half of the 20th century there seems to have been a tug of war between phonetic translations (阿摩尼亚) versus character creation (氨or胺, I am not a chemist). It seemed that the new-character approach won, at one point in time. And somehow we stopped making new characters and switched back to phonetic translations.

    That was a period of time when China had profound language/culture contacts. Do you know if this was an isolated battle in chemistry or part of a broader back-and-forth as the culture and the language was trying to figure out how to deal with the changes? If you know any work on language change during this period, I would greatly appreciate!

  2. I don’t have any specific background in either chemistry nor this particular period of Mandarin changes, but I think yo bring up a really good point. I once spent a week going through the Unicode list of characters available in Pleco, many of which lacked a definition. But a good number where chemical names or something similar. The rest were likely long-unused classical terms referring to some variety of flute that no one can remember.

    It makes some sense that there would be a push to adopt the terminology to that particular science in a way that would be compatible with the way the rest of the world was speaking about it. And at that point there wasn’t such a limitation as Unicode and other methods now impose.

    I can’t off the top of my head think of another area where there was a similar mass character creation, but that’s not to say it doesn’t exist.

  3. Zev Handel says:

    Kellen, thanks for this interesting post.

    In it you say, “Instead, I predict the written language having a much more significant impact on the spoken language than it did just 50 years ago.”. I don’t see how this follows from the situation you describe. Can you elaborate a bit?

  4. Of course.

    Looking at Arabic as an example, we have the word for “computer”, for which in most situations you’ll hear كومبيوتر kūmbīūtur in everyday speech. However In some places this has been seen of too much of an invasion by English, and so instead you’ll hear حاسوب ḥāsūb, which actually has more to do with what a computer does and not just what the name sounds like in English.

    The difference between tomorrow and 50 years ago is that 50 years ago it would have been no problem to come up with the equivalent of “ḥāsūb”, where as now, with the characters limited as they are, we’re much more likely to see things like 沙发 and 三明治, phonetic transcriptions of words rather than translations of concepts. It could be argued that 电脑 diànnǎo is a conceptual translation and so why not more of that, but being able to call something “telescope” (which is ultimately such a combination as diànnǎo) is only half the battle. In some cases (such as brand names or new technology), “diànnǎo” isn’t adequate. Or at least, if adequate, wouldn’t have been one of only two options. The third option of truly coining a new phrase/word/character like 圕 tuān is now lost.

    The majority of people would certainly say this lost isn’t one of any import. I’m disagreeing. I should probably get to posting part three of this series, which may lay some other questions to rest as well.

  5. 慈逢流 says:

    “I wouldn’t be bringing this all up if I didn’t have an idea of how to fix it” sounds promising. can you hint at what have you in mind here?

    “The third option of truly coining a new phrase/word/character like 圕 tuān is now lost”. or so they say. it does definitely look like it at present, but it definitely looked different during the 19th c and the republican time between the wars. things may change again. as you say, “on-again, off-again campaigns”, which i immediately understood as an equivalent to the german derisive “rinn in die Kartoffeln, raus aus die Kartoffeln”.

  6. Kellen says:

    I do have an idea, and while it’s well formed at least in my head, I never fully described it in another post as I’d intended simply because I think to really explain it clearly would necessarily involve some redundancy and would not make for a very good read.

    That said, I think tomorrow I may make the effort to explain it all the same. Look for it in a new post. At the very least I’ll outline the idea.

  7. michaelyus says:

    Just in response to Gary Feng: for ammonia, it’s 氨 ān; 胺 àn has the 肉字旁 so it’s nitrogen-containing organic chemistry – an amine. There’s also 铵 ǎn with the 金字旁 for cationic or metallic solids in inorganic chemistry – ammonium salts.

  8. Alan says:

    To fix this, wouldn’t there have to be something like a Unicode extension for composing characters from radicals and other characters, much like the way some accents can be added to any character, without every possible accented character needing to be defined?

    That way, so long as any new character re-used existing radicals and characters, it could always be defined. There would have to be strict rules for how the position within the character is defined, so that there is only one unique way to define any given character.

    Of course this is probably a practical impossibility…

  9. That’d be one way to do it. I had a different idea of how it would be fixed, but I never ended up writing that post, mostly due to a perceived lack of interest. Maybe I’ll try to throw that together one of these days.

    Ultimately though you’re right; Going against what’s out there, no matter how good the new idea is, is probably a practical impossibility.

