Xiao'erjin is not quite Pinyin

Xiao’erjin (alternatively xiao’erjing¹ 小儿经) is the name of a form of transcription for Mandarin and related languages. Rather than using Cyrillic or Roman letters, the Arabic script is used. China has had a large Muslim population for about as long as there have been Muslims, and it was among those of them who were less likely to have a traditional classical education that the system was used.

The structure is fairly simple. Syllable initial consonants are written with a single Arabic letter. The final then was primarily done with harakat or vowel diacritics. Before Annals of Wu, was blogging on xiao’erjin and Chinese Islam in general on another site, appropriately enough called xiao er jing.

I recently received an email on the topic, which happens from time to time. This time, though, I decided a more public response might have some value.

Here’s the email:

I live in Beijing and want to give a non-Chinese speaking arabic friend a surprise by translating a few words for him in your widget.

Am I right in thinking it is entirely phonetic, so that similar sounding words with the same pinyin will have the same arabic script? Also, which way around should it be read as it comes out of your widget? I ask because when I cut and paste it into Illustrator, it seems to be the mirror image.

The widget in question can be found on the sister site. It’s a quick and dirty conversion tool for converting pinyin to xiao’erjin based on the most widely known forms with some changes of my own to improve legibility² and fidelity to the pinyin. Xiao’erjin also preserves 入声 entering tone syllables, now long gone from Mandarin in any meaningful way. I digress.

Xiao’erjin is in fact phonetic, but it’s not quite an Arabic pinyin. For example the Arabic script lacks a letter for /ŋ/, a sound that is ubiquitous in Mandarin. Xiao’erjin uses the same letter for /n/ as it does /ŋ/. Arabic, on the other hand, often combines ـن /n/ and ـغ which is usually /ɣ/ by itself but, when combined with ن (i.e. ـنغ), it is typically pronounced /ŋ/. Mind you this is only in loan words.

Also, the equivalents to pinyin x or q are also missing from the Arabic script and language. Xiao’erjin is a historical script intended for a limited audience. It unfortunately doesn’t have much correlation to pinyin, and would probably elicit little reaction from an Arabic speaker beyond confusion.

Before, the system of xiao’erjin was used by members of the Islamic community who were unable or unwilling to read the Sinitic characters. With the spread of computers and the official status of pinyin, I think it’s safe to say that xiao’erjin has been pretty completely wiped out. Long-time readers may recall me mentioning a desire to re-work the system using letters from the Uyghur adaptation of the Arabic script, but this would only be for the sake of intellectual curiosity, and not for any practical purposes.

I would be curious to know how a Mandarin text book from an Arabic speaking country would look. I imagine any good one would simply use pinyin, but that doesn’t mean they all would. I had a room-mate from Turkey who brought with him a Turkish-Mandarin phrase book in which every Mandarin syllable was transcribed phonetically with the Turkish alphabet. It posed a similar problem of missing phonetic representations as found when trying to adapt one language’s writing to another.

– – –
1. In this case I’m using “xiao’erjin” despite never writing that in Chinese for no other reason than to differentiate from the website of the same name.
2. e.g. X and S may both be written س so to clear things up a bit I moved one over to ص which was mostly un-used. What’s more, a single-dotted س exists in some sources, but isn’t really Unicode-compliant. Meawhile ښ (that’s س with one dot above and one below) and ݭ (two vertical dots above) seemed like overkill and, more importantly, unlikely to display on most computers.

22 responses to “Xiao’erjin is not quite Pinyin”

  1. It’s interesting to me that you went ahead and added little improvements.

  2. Kellen Parker says:

    Because it shows how little social life I actually have?

  3. Jack says:

    Makes you appreciate just how ingenious the pinyin system is. Two totally different languages (more I guess, since it applies at least to all Latin script languages) brought to a common meeting point that’s fairly easy to grasp for both. It’s easy to take it for granted now it’s all drawn up and established.

  4. Kellen says:

    I actually detest pinyin. To me it’s a necessary evil. It’s still better than a lot of the alternatives, but it still kinda sucks.

    Ah well.

  5. No, because you didn’t think of it as “officially fixed”.

  6. Jack says:

    I certainly wouldn’t consider myself as having anything near an advanced knowledge of Chinese, but I haven’t had any serious problems using pinyin. What do you think are the major issues with it?

  7. Kellen Parker says:

    Randy: Ah. Well, it’s not standard, and it has problems, so I fix away.

    Jack: It’s internally contradicting. -ui is -wei. It’s one more layer of shit for the learner to learn. IPA /y/ is sometimes ü and sometimes u. Yan and yang have totally different vowel sounds while lin and ling don’t*. The only thing particularly r-like about pinyin r also exists in zh/ch and sh. I got more. I’m also a fan of Italian style vowels where o sounds like /o/ and u sounds like /u/. Meanwhile o is /ɔ/ and ou is /o/. And what’s up with only one letter having a diacritic? Was y** hungover from too much baijiu on the day they assigned roles?

    There’s more, but I think this is enough.

    Granted there’s no way to make it perfect for everyone. I just think any transliteration ought to be intuitive enough that a new learner of the language can have a fighting chance at guessing the proper pronunciation instead of spending the first year of learning the language saying things wrong because 1) it’s not obvious from the writing and 2) it’s taught hella poorly anyway.

    * or do but it’s so small as to be irrelevant when compared to that of yan/yang
    ** I’m from the Albanian school of diacriticisation. I.e., at least two letters should have one of you’re wasting everyone’s time.

  8. I think pinyin surpasses haugul as a spelling system. All of the things you mentioned as (perhaps) flaws in pinyin are rule-based, and while they do add a little more complexity, it’s not that difficult for a student to learn. It took me a few hours with a tape player on a plane to get a serviceable command of it. It doesn’t have any irregularities.

    However, I still would be interested in the “there is more”. It would be good to have a complete list somewhere.

    And the diacritic issue has a popular solution that is so widely accepted that now I see v instead of ü very commonly even on big green highway signs, not to mention products and shop signs.

    A small point: I don’t think ou can be /o/, but rather it is /ou/. It has a closing glide.

    Another: Isn’t it Euro-centric to expect Italianate vowels? It happens to (kind of) work with Japanese, but much less so with Korean where you have a sixth major vowel (으), and the “vowel” inventory of Chinese is so weird that it might be better to avoid suggesting Italianate vowels at all.

  9. pott says:

    One reason why xiaojing doesn’t correspond well to pinyin is that they are used to write different varieties of Mandarin. In Lan-Yin Mandarin and several western varieties of Zhongyuan Mandarin, /-n/ has merged with /-ŋ/ except after /a/. This also explains the variation jin~jing in the name of the writing system. The distinction between /an/ ـًا and /aŋ/ ـَانْ on the other hand appears to be well kept in xiaojing. This being said, it seems unlikely that any of these dialects has preserved the entering tone. Could you give an example of xiaojing representing the entering tone?

    I’m also intrigued by the xiaojing system of your widget, which differs a lot from the analyses I’ve read. For example, according to these analyses, pinyin q is usually represented by ک. For varieties of Mandarin in which ti- has merged with qi-, an alternate form of q is t ت‎. However, in your widget, q gives ch چ, sh ش, or ث, depending on the final. What sources is your system based on? Is it a combination of different schemes?

  10. Kellen Parker says:

    Well, v for ü is common, and fine, and it’s how I type Chinese. But it’s still the same problem. It’s a letter that is being used to represent something it rarely represents otherwise. V isn’t a vowel, so using it to represent a vowel adds to the complexity. Yes, the problems are rule based, and yes, a student can learn pinyin quickly. None of that changes my complaints. Just because it can be learned doesn’t mean it’s a good system.

    Re hangul, do you mean pinyin is better than 한글 or better than the Romanisations of hangeul? I’ve got no love for hangeul, but I’m curious to know, since you brought it up, what your issues with it are.

    I’ll have to get back to you on an example of entering tone. Wikipedia references it but I’d have to track down the Japanese source for a specific example. Otherwise, yes, it’s a combination of different schemes. As you mentioned, xiao’erjin represents dialects, so I’ve made an attempt to bring it in line with Modern Standard Mandarin.

  11. Kellen Parker says:

    Forgot one thing. If it’s Eurocentric to expect Italianate vowels, then isn’t it just as Eurocentric to use Roman letters in the first place?

  12. V isn’t a vowel

    It didn’t clearly start out as specifically a vowel or a consonant(it was used as either, or both).


    For the Italianate vowels, I meant the sounds, not the letters. Since many Asian languages have vowel systems that are too different from Italian, why try to map the five Italian vowel symbols onto them?

    Also, we use “vowel letters” to represent consonant sounds and “consonant letters” to represent vowel sounds in everyday English words, so why be prohibited to do so in romanization of other languages?

    You know more about Arabic romanization than I do, but I remember seeing “7” used, and that’s neither consonant nor vowel!

    I believe romanization should be all about what’s easiest to use; and nowadays, that usually means easiest to type.

  13. Kellen says:

    Touché. However 7 (for ح, or 5 for ص) isn’t a standard being taught to new students. It’s an Internet convention akin to 3 for e in writing 3nglish w0rds, though slightly more useful. Still, not anything you’ll find in a beginning Arabic textbook.

    Fine on v. No problem there. Except pinyin is modern and y seems a better fit.

    Look, I admit pinyin is fine to learn. I admit my grievances are largely aesthetic. I’m not saying we should do away with it. Only that I hate it and wish it had been done differently.

    GR, anyone?

  14. Tim says:

    I would much rather see tonemark pinyin than GR. One important thing in our globalized world is communicating with non-fluent speakers of the various languages, and GR is in my opinion is simply opaque. Yes, the Roman alphabet is eurocentric, but only insofar as that it was developed there. I feel one can hardly make the argument the letters themselves have anything to do with Europe besides origin. In any case, a non-fluent Mandarin speaker is likely to need to know placenames and personal names in Chinese at the very least while doing it, and I feel the diacritic tone system and relative stability of orthography is more easily understood by the average foreigner (I daresay of any national origin) than the direct-in-spelling system of GR.

  15. Tim,
    I’ve got mad love for YR Chao and GR. But I agree with you that GR is actually of little practical value at this point in time. It is opaque, and not something I’d wish on every Mandarin learner let alone Mandarin speaker out there.

  16. Claw says:

    @Kellen: While we’re on the topic of Xiao’erjing, would you mind if I asked you a tangential question? I brought this issue up on the Wikipedia discussion page on Xiao’erjing a while back but with no response. Would you happen to know the answer?

    In the Initials and consonants section, the notes for number 17 (, [s]-) say, “only used for entering tone or formerly entering tone syllables.” However, it gives the example 思 sī, which is not an historically entering tone syllable. However, number 19 (, also [s]-) gives the example 色 sè, which is an historically entering tone syllable. Is it possible that the note went on the wrong row?

    The corresponding Chinese Wikipedia page, from which I assume the English version was copied, has the exact same issue.

  17. pott says:

    @Claw: The Wikipedia article contains too many mistakes to be taken seriously. Neither number 17 (س̇ sīn with dot above) nor number 19 (ص ṣād, emphatic s) is restricted to entering tone syllables. In the second sample on this page, sīn with dot above is used for 四 on the first line and ṣād is used for 雖 on the last line. Yibulaheimai (1992) does not mention sīn with dot above, but states that ṣād represents /su/, and gives the examples 虽 and 随. None of the above has entering tone.

  18. Kellen says:

    We’ve all been going about this the wrong way. س with dot above is actually سُ.

  19. pott says:

    @Kellen: Did you mean سْ? 四 could well be represented by سِْ. The sukūn can be used together with the kasrah to represent /ei/ or /ɿ/, according to Yibulaheimai.

  20. Kellen says:

    -ُ or -ْ but for /su/ I’d imagine -ُ. I’d love to see a scan of the page.

  21. pott says:

    The only example of س with dot above I’ve seen stands for (a part of) 四 sì. There is a link to the scan in my post number xvii. ص (with no dots) for /su/ is a completely different story.

  22. davide says:

    hi guys
    urgent matter!i need to translate 我也居住在你的天空下 in xiao er jing.
    Anybody can help me out? is for an artistic project

