The character that beat the shit out of me

Consider the use of profanity quotative here. To take it directly from the definition in the ABC Dictionary:

cèi v. <coll.> (1) smash to pieces (2) attack; beat the shit out of

Emphasis mine. Maybe it’s only fair that the ABC doesn’t mince words — the description seems rather appropriate for this morning’s character venture.

All I wanted to do was to write the everyday word cèi, meaning to smash/shatter. The process of finding the appropriate hanzi is usually simple enough:

  1. Look up the pinyin in the open-sourcish CE Dict (which, while not always reliable, has lots of stuff and is conveniently available through MDBG‘s interface)
  2. If that fails, get up and find cell phone and type it into the incomparable Pleco, where I’ve got at least four dictionaries at my disposal.
  3. I almost never have to consider a step 3 at this stage in my hanzi (il)literacy, since I find almost everything I want. But see below for details.

Not today. Step 1 gave me this:

mdbg_cei

Neither daunting nor even surprising. The CE Dict often fails to find words you’d think it would have. So on to step 2, where at first everything appeared to follow protocol. As you can see in the cèi definition at the top, ABC did indeed provide a character for the right definition of cèi. Great, home free. The final step is just to type it into my computer, as I’d originally intended, and be off with it.

But hang on, not so fast — consider snafu 1: no such character under “cei” in Google’s IME…

cei_in_goog_imeIn fact, you can see that it doesn’t even like the Pinyin initial+final combination of “c” + “ei”. It’s trying to split off “ce”, even though there’s no legal Pinyin that starts with the following “i”. To pile insult on, even my beloved Pinyin ime (that is, the IME I use for writing Pīnyīn with tone marks), Pinyinput, does not allow “cei”; instead it forces you to write cè and add “i” on the end after you’ve finalized the input.

But no worries. There’s always the ungainly-and-slow-with-a-mouse but effective character-drawing input system at NCIKU, right? It’s the secret step 3 in the sequence above — the last-resort way to get a mystery character onto the computer.

Et tu, NCIKU?

nciku_cei

You can see on the left the character that ABC had rendered for cèi. But it’s not in NCIKU that I can find, not even on screen two or three. Feel free to argue (not really much of an argument, actually) that the penmanship / mousemanship is lacking, but I tried it twice, to no avail. Maybe there’s a fántǐzì (繁体字 = traditional character) twist on this?

My last-last resort was to try copying the character out of the ABC on my cell phone, pasting it into Windows Mobile’s absurdly bad email client and emailing it to myself. I thought it sounded smart, but it’s apparently beyond my technology/unicode knowledge. I end up with

… which does nothing for me. What am I missing here?

———————————–

UPDATE: Randy reminded me I was utterly remiss in not checking with the 老北京 Beijing dialect book he gave me a year or so ago. Sure enough, it supports using the same version of cèi that ABC used:

beijinghuaentry


22 responses to “The character that beat the shit out of me”

  1. A quick pinyin search on nciku got “梡 [cèi]” with no English definition. However the corresponding Mandarin was “打碎”. (Alternatively zdic.net gives 梡 as kuǎn.)

    The  you posted is, at least on my end of things, at Unicode U+EECA which is reserved for private use, i.e. people who want funky stuff in their font like a company logo. In other words, it’s out of the CJK ranges.

    A little digging and I did find [

  2. Duncan says:

    Interesting that you should post this, as just yesterday someone at HarperCollins was asking me how to display/input the rarer characters (there’s a BIG dictionary project in the works).

    Essentially, and I’m running with the profanity, but I find Nciku to be a piece of crap when it comes to this kind of stuff.

    Looking at the shuowen/kangxi entries for 梡 above, it seems the word has evolved from meaning ‘firewood’, to ‘tray for sacrificial meat’, to a more general purpose ‘chopping board’. Could it now mean ‘chop to pieces’ in a colloquial sense? Possibly, but I wouldn’t trust Nciku on this.

    For the cei character, 汉典 gives an alternate pronunciation as sui4. With a shuowen gloss of ‘破也’.

    Anyway, if you use 逍遥笔 input (www.xiaoyaobi.com) you can draw the character, but to actually display I reckon you’re going to need to download font extensions or whatnot like Kellen mentions above, available at 汉典 – http://zdic.net/appendix/f18.htm

  3. Syz says:

    @Kellen: you will make yourself a name as the unicode hanzi hound yet. Nice work, and all in less time than it took me to mouse the character into NCIKU.

    @Duncan, given the extra data from 汉典, I’m inclined to think that ABC isn’t entirely making it up. Someone, at some point in history, must have used this character to represent cei4 meaning exactly what it means in Beijing dialect — to break.

    That said, it also seems pretty fair to say there’s no accepted standard. I tried that character out on a few people yesterday. No one had ever seen it. But everyone knew the word cei4 and was quite confident in proclaiming it to be exclusively Beijing dialect and therefore (the tone of voice indicated) unworthy of being written. I would like to claim that this kind of relationship between spoken and written language is substantially different in English. I’m pretty sure it is. But maybe I’m just not thinking of the right examples. Is there any English word out there that everyone’s familiar with but no one would consider writing down?

  4. While I admit 梡 is a stretch, it all comes back to the lack of written dialectal representation. I have friends that swear by 操 as meaning 肏 and 肏 alone. I trust nciku on this only as far as random speakers need a character to write this stuff out in their BBS discussions.

    The fonts I use for rarer stuff are SimSun and SimSun Founder Extended. Unfortunately they’re Song fonts and so not great for onscreen reading.

  5. Syz says:

    Check out the update above from the 老北京 dictionary! I’ll take this as good authority that ABC’s character was right, no matter how obscure it is. On the other hand, one could still argue that there is “no way to write cei” if no one actually recognizes this character!

  6. John says:

    Is there any English word out there that everyone’s familiar with but no one would consider writing down?

    The two that come to mind are the super-informal ways of saying “yes” and “no” which are sometimes written as “uh-huh” and “unh-unh” (the second one is the one that’s a lot harder to write out, and also has a very obvious tonal quality to it).

    Also, the “nother” as in, “but that’s a whole ‘nother story” feels pretty weird when you write it out, even though it’s pretty common in American speech.

  7. Syz says:

    Good examples, John, esp. I think “unh-unh” is pretty tricky. Funny you bring up ‘nother because that’s exactly the one I was thinking of myself. Lots of people would have a problem with classifying that as a word, let alone writing it out.

    If I was going to make a claim, though, it would be something like this: there are WAY more common words in Mandarin that have no standard written form than there are such in English.

  8. John says:

    Yeah, I’m not going to try to argue with that!

  9. Zev Handel says:

    On a Mac, one can usually find any character that is part of the Unicode standard without too much difficulty. You need to have the “Character Viewer” enabled (through the International system pref panel). Then you choose “Show Character Viewer” from the flag menu. Select “View: All Characters” and “By Radical”. All the CJK Unicode characters are findable this way by traditional radical+stroke lookup. Under the “tile” radical, 8 strokes, the character in question can be found: #24B62. The problem, as noted by an earlier poster, is that characters like this one, in the upper plane of Unicode, are not included in many fonts, and quite a few applications (like MS Word on a Mac) can’t display them even if you have a font.

    All these characters do exist in the freely available HAN NOM fonts. (http://vietunicode.sourceforge.net/fonts/fonts_hannom.html)

  10. Syz says:

    Zev,
    So you got me excited about the new fonts. I downloaded but still haven’t been able to get cei4 to display. Ideally I’d like to add a font tag to this post so that if someone has the hannom fonts installed, cei4 displays properly. But clearly I’m beyond my very minor league in terms of unicode & displaying issues. Any ideas? (PS: I’m on the evil windows — Vista no less! — so no cool mac tricks for me).

  11. Brendan says:

    For stuff like this, I use Wenlin — search for characters by components 卒 and 瓦, and bang, there’s

  12. Syz says:

    Brendan, I was about to blow the money and buy wenlin — anything that can help me find Cei is worth the dough, right? But then I get to their site and find that it’s a freakin’ CD-ROM. Did these guys never recover from Y2K issues or something? sheesh

  13. Zev Handel says:

    Syz,

    Not sure what to do on Windows. I guess to make sure your fonts are installed properly, you could try copying the cei4 character (or the blank square where it should be) out of Kellen’s first comment to this post, and try pasting it into Notepad or something similar on your computer (but not Microsoft Word), then see if you can get it to display properly.

    Someone out there with Windows must know a good way to get at these upper plane Unicode characters.

  14. Syz says:

    Thanks Zev. I got out of MS Word and did things in Notepad and got it to display. Weird things is, now it displays just fine in Word too.

  15. Duncan says:

    Syz, or indeed anyone else…

    You could also use the following unicode super-cjk font extensions available at http://okuc.net/Software/Unifonts.exe

  16. BMG says:

    Give this website a try. It will help you find every Unicode character and many that aren’t in Unicode (I have encountered extremely rare characters that aren’t in it, but no one needs them anyway):

    http://mousai.kanji.zinbun.kyoto-u.ac.jp/ids-find?components=%E7%93%A6%E5%8D%92

  17. BMG: Interesting.

    Duncan: Got a non-exe version for us Mac users?

  18. Duncan says:

    Kellen: Sorry, no idea.

  19. justin says:

    I couldn’t get Wenlin to accept cei as pinyin (slash character converts, but with cei a slash just gets typed out) but when I copy&pasted in “” from above I got a link to the real place they store the character

    xxxx  (private-use clone of

  20. Chris says:

    Is there any English word out there that everyone’s familiar with but no one would consider writing down?

    The sound of someone blowing a raspberry — easily pronouncable, but the closest approximation I’ve seen to it being written is “thpbpb”.

    I suppose one might argue about whether or not it’s technically English, but then perhaps any “word” fitting your description could easily be argued about in that regard.

  21. Kaiwen says:

    One English word I feel uncomfortable writing down is the -y short form varient of breakfast. It might be rendered as ‘brekky,’ but I can’t come up with a satisfactory spelling.

  22. Syz says:

    @Chris: my gut reaction is that the raspberry doesn’t even rise to the level of interjection — it’s just a sound, quite unlike the very run-of-the-mill (in Beijing) verb cei4. Feel free to tell me if I’m out of touch with contemporary use, though, it would hardly be the first time.

    @Kaiwen: interesting example, and I have to admit my cloistered English had not even heard of that form (I’m from the Western US). It’s easy to imagine the struggle with spelling, though, and it makes me think I’ve come across a similar example but it’s not coming to mind right now.

Leave a Reply to Syz