Hello !
My name is Minkyu and I’m from Korea.
I am enjoying your blog a lot. I am currently learning Chinese as my fourth language and I am very interested in linguistic view on Chinese language.

I was randomly looking up for a Kana (Japanese phonogram) transcription chart of Mandarin Chinese by their pinyin (Romanization of Mandarin) [Link] and I came up with this question.

But here, you can see (I suppose you can read Japanese at least its Kanji (Hanzi) parts) that “you, miu, diu, niu, liu” are transcribed into “イウ/iu/, ミウ/miu/, ティウ/tiu/, ニウ/niu/, リウ/riu/” when it has either First tone (high) or Second tone(rising), and transcribed into “ヨウ/you/, ミョウ/myou/, テョウ/tyou/, ニョウ/nyou/, リョウ/ryou/” when it has either Third tone (dipping) and Fourth tone (falling).

I knew that the vowel part “iou” can be pronounced either way, [jow] or [jiw]–or even their middle– but I was never heard that this variation is according to their tones.

Could you tell me how “iou” varies in modern Madarin phonology please? Or is this difference just based on hearing cognition of Japanese-speakers?

And could you also tell me why “jiu, qiu, xiu” have no difference in their transcription and are transcribed into “チウ/chiu/, チウ/chiu/, シウ/shiu/” solely?

Thank you.

Best Regards,
Minkyu Kim

The Japanese Wikipedia page to which Minkyu directs us has two tables – the upper one for first and second tones and the lower one for third and fourth tones. Move over to the right to see the entries for pinyin iou.

Would any of our readers like to offer an explanation?

  1. Carl says:

    I blame Gwoyeu Romatzyh.

  2. Aaron says:

    I wouldn’t pay too much attention to that table. Basically there is no actual standard for transcribing Mandarin into katakana (which is about as painful as transcribing English into hanzi), and the only discussion on the talk page is three people all agreeing that the article is baseless and should be deleted.

  3. Kellen says:

    My thought was that it was something to do with the fact that Japanese is pitch-accent based, and that this sort of change in katakana was to sort of help the learner along to more properly manage the tones. But that’s only a guess, since I don’t actually know anything about Japanese.

    Be nice to GR.

  4. Aaron says:

    As far as I can tell, being fluent in Japanese and currently studying Mandarin at a language school in Japan with Japanese classmates, untrained native Japanese speakers do not have any innate tone comprehension that would lead them to transcribe different tones in different ways, except for distinguishing by duration. For instance I can imagine someone writing yòu = ヨ and yǒu = ヨー/ヨウ, but I can’t for the life of me think of an explanation for the table mentioned in the original question.

  5. Zrv says:

    In early 20th-century Peking (Beijing) dialect, the pronunciation of the final /iou/ (pinyin you, -iu) had two distinct forms correlated with tone: one pronounced closer to [iu] found in the first and second tones, and one pronounced closer to [iou] found in the third and fourth tones. It was a well known feature found in many description of Peking dialect up through the latter half of the 20th century, including in Y.R. Chao’s grammar. Until recently it was still possible to find some older speakers of Peking dialect who made the distinction. The kana transcription of the final is no doubt reflecting this earlier feature of Peking phonology, even though it is not a feature of modern Putonghua.

    As for Minkyu’s second question, it is clear from the chart that the unaspirated/aspirated distinction found in Mandarin initials (e.g. pinyin b-/p-, z-/c-, j-/q-, d-/t-, etc.) is not reflected in the kana transcriptions. This is probably because of a mismatch between the two-way distinction in Japanese (based on voicing) and the two-way distinction in Mandarin (based on aspiration).

    The mismatch is hard to see when we use a romanization of Japanese to compare to pinyin transcription of Mandarin. Consider the part of the chart where the pinyin syllables da and ta are given with their Japanese kana transcriptions:


    If we replace the Japanese kana with romanization, it looks like this (the diacritic indicates a long Japanese vowel):


    Minkyu’s question is, I believe, this: Why don’t we see



    If we use IPA instead of romanized transcription the reason becomes clear. The two Mandarin syllables both have voiceless initials, one aspirated and one unaspirated, i.e.

    da [ta]
    ta [tʰa]

    Two Japanese syllables are available that might be used to render these:

    ダー dâ [daː]
    ター tâ [taː]

    To a Japanese ear, the Mandarin initial da [ta] lacks voicing and simply does not sound like Japanese ダー dâ [daː]. The voiceless Japanese form ター tâ [taː] is the closest sound to both of the Mandarin syllables.

  6. Sima says:

    Just to expand on Zrv’s comment a little, according to 王力’s 汉语语音史, published in 2007, though written in the 1980’s, before his death in 1986, modern (post-1911) Beijing pronunciation, as representative of Northern Chinese/Mandarin (p480) does/did make this distinction:

    由求,一般标作 [ou, iou]。从音位观点看,应该标作 [əu, iəu]。在听觉上,[əu] 和 [ou] 是很接近的。由求的齐齿呼,实际上也并不读 [iou]。据赵元任研究,尤幽的平声字读 [iou], 仄声字读 [iou]。依我观察,平声字读 [iu],仄声字读 [iou]。

    (For anyone not familiar with the notions of 平声 and 仄声, for our purposes, 平声 is the first and second tones, 仄声 the third and fourth tones.)

    So Zhao Yuanren (Yuen Ren Chao, Wang Li’s teacher) had noted that the syllable we write as ‘you’ in hanyu pinyin (and ‘niu’, ‘qiu’, etc) was pronounced with a different emphasis according to whether it was 1st/2nd tone or 3rd/4th tone. Wang goes further, saying that according to his observations, ‘you’ is pronounced [iu] on the first two tones and [iou] on the third and fourth tones.

    As Minkyu notes, both pronunciations seem to be in general use. Zrv says this distinction is not a feature of modern Putonghua, but I’m not entirely clear that it’s no longer important for any speakers of northern dialects. I’ve certainly run into problems on more than one occasion when I’ve not been understood using a [-iou] pronunciation and, knowing that there’s something about these syllables, but not being quite sure what, I’ve tried a [-iu] pronunciation and been understood. This has then been followed by the person saying [-iu] repeatedly to make sure I wasn’t going to make such a stupid mistake again. I just can’t be sure that this was connected to tone.

    Minkyu also asks why the syllables written in hanyu pinyin as ‘jiu’, ‘qiu’, ‘xiu’ are not presented as being pronounced differently according to tone. As far as I can see, Wang Li doesn’t make a distinction, though I believe the initials ‘j’, ‘q’, and ‘x’ are a relatively late development and, I suppose, might just have caused [iou] rhymes to be produced slightly differently.

    Zrv’s comment about d/t interests me. I have met several native Japanese speakers who speak pretty respectable Mandarin but, to my native-English-speaker ear still fail to distinguish these two sounds. I should say, I’ve also met highly proficient Japanese speakers of Mandarin who *do* make this distinction clearly enough.

    I have occasionally heard native English speakers, in the early stages of learning Chinese, try to use the English [p, t, k] when producing syllables like 包,单,够. I’m not sure whether this was the product of some knowledge of phonetics or the product of a phrasebook-type pronunciation guide, but it just didn’t seem to work – they were misunderstood by native Chinese speakers.

    As far as I can tell, it makes sense for English speakers to use the English d/t when first learning Chinese. Presumably, voicing just drops away without too much conscious effort as the learner progresses. Though, perhaps, it lingers and marks the ‘foreign accent’.

    I understand the Japanese voiceless stops p, t, k have some aspiration. Would it not make more sense for Japanese speakers to use their voiced/unvoiced pairs when first learning Chinese?

    Could you tell us what goes on in your Japanese Chinese Classroom?

    On a related note, having spent several years in China, I seem now to have difficulty distinguishing the pairs b/p, d/t, g/k in French (and occasionally with French speakers speaking English), though I don’t recall this ever being a problem for me in the past. Does this mean that my ears tuning in to one foreign language has created confusion for me in another language, where none previously existed?

  7. Zrv says:


    Thanks for the additional information. I see that I misunderstood Minkyu’s final question, which I misinterpreted to be about the representation of initial consonants. I also don’t know why the chart makes no distinction according to tone for the pronunciation of -iou after j,q,x.

    One reason that the whole voicing issue is tricky is that voicing and aspiration are not really binary, they are part of a single continuum. Phoneticians talk about Voice Onset Time (VOT), the gap between when vocal cords start vibrating and when a consonant is released, measured in milliseconds. For a truly voiced [b], the vocal cords are vibrating before the release, i.e. VOT is negative. For a truly plain (i.e. voiceless and unaspirated) [p], the vocal cords start vibrating at the time of release, i.e. VOT is about zero. For [pʰ], voicing starts some time after the release, while the vowel is already being articulated. “Aspiration” is in fact just a voiceless vowel — a rush of air through open vocal cords before they are tightened for vibration.

    We rather arbitrarily divide this continuum up into three discrete types as illustrated above. But there are differences. The more negative VOT is, the more heavily or fully voiced a sound is. And the more positive VOT is, the more heavily or strongly aspirated a sound is.

    In word-initial position, English “voiced” stops are barely voiced. VOT is only slightly negative, close to zero. So these voiced stops are actually pretty close to the Mandarin plain stops. But in French, voicing is “heavy”. Vocal cord vibration begins long before the stop is released. Those French voiced consonants have a real rumble to them, and don’t sound at all like Mandarin plain stops.

    If a language like French or Japanese has a voicing distinction in which the voiced sounds are heavily voiced, then their voiceless counterparts can vary in how much aspiration there is — i.e. in VOT delay — without causing confusion. But in English, where the “voiced” member has little voicing, the voiceless member has to consistently stay quite aspirated to remain distinct.

    To my ear, Japanese voiceless stops are indeed slightly aspirated, but far less so than the English voiceless stops.

    Incidentally, the Korean “plain” stops (e.g. ㄱ /k/) are voiceless and slightly aspirated in word-initial position, while the “aspirated” stops (e.g. ㅋ /kʰ/) are heavily aspirated in initial position. English aspiration is about halfway between the two. English speakers will perceive the “plain” Korean sound sometimes as voiced, sometimes as voiceless aspirated, sometimes as something in between. Ask a native Korean speaker to say kimchi (sometimes romanized gimchi) and you can observe this — do you hear “k” or “g” at the beginning?

  8. Sima says:

    Many thanks for that. It makes things a lot clearer.
    My first reaction to the kimchi question is to say that I’m almost certain I would hear a korean friend here say ‘gimchi’. Is that what you would expect? I hope to catch up with him soon to find out for sure.

  9. Syz says:

    just have to chime in on the kimchi/gimchi question. When I was stumbling through my year in Pusan-tinged Korean, I was always puzzled by the ㄱ. I came to the conclusion it sounded to me more like a “k” in initial position but almost always like a “g” inter-vocalically. It leads to all sorts of weird (to a native english speaker) romanization problems, such as hearing “bulgogi” as “pulgogi”

  10. Zrv says:


    Korean consonants are a wonderful illustration of allophony (i.e., the way that phonemes change pronunciation depending on their relationship to the sounds around them). Native speakers aren’t usually conscious of allophonic variation. English has some similar phenomena. For example, the /p/ sound in “peak” and the /p/ sound in “speak” are pronounced quite distinctly (which you can prove to yourself by recording yourself saying “speak” and then using sound-manipulating software to strip off the initial s-; the resulting pronunciation may surprise you).

    The Korean consonants /b d g j/ that are, in the current official romanization, romanized as are pronounced as voiced [b d g] in intervocalic position, and voiceless slightly aspirated [p t k] in initial position, and as voiceless unreleased [p t k] in final position. Native Korean speakers are generally unaware of these differences. Older Romanization schemes for Korean, which were designed for speakers of European languages, represented these allophones with different letters, which is why Busan/Pusan is traditionally spelled Pusan and now officially spelled Busan.

    You can think of “tone sandhi” as a kind of allophonic variation as well. Native speakers are not consciously aware of changing the pronunciation of the first of two third tones in sequence.

  11. Zrv says:

    The beginning of the second paragraph should say:

    “The Korean consonants /b d g/ that are, in the current official romanization, romanized as are pronounced as [b d g] …”

  12. Zrv says:

    I keep using an angle bracket plus b which is, sadly, also an HTML tag for bold. Here’s one more try:

    “The Korean consonants /b d g/ that are, in the current official romanization, romanized as “b d g” are pronounced as [b d g] …”

  13. Peter Nelson says:

    Slightly off topic. I have difficulty explaining the idea of allophonic variation to pretty much anyone who doesn’t know what it is. This is compounded by the fact that many people don’t have experience with other languages and find it difficult to think about their own language critically. Do any of you have any particularly good examples in either English or Chinese?

    /p/ as [pʰ] vs [p] is not exactly a home run unless you have sound recording/editing software on your person at all times.

    I think maybe /t/ as [tʰ] vs [ɾ] is more obvious to the ear, but I also suspect that people might evaluate their own sound production as “lazy” or “not correct” rather than getting the whole idea of allophonic variation.

    If this seems to far off topic, feel free to ignore it.

  16. Zrv says:

    Peter, you might try the pronunciation of the second /d/ in “did you”, which is invariably palatal in all but the most hyper-formal speech.

    Another interesting one to try is /n/ in “tenth” vs. “ten”. If students speak naturally and with some introspection (say “ten” and hold the “n” sound, note where your tongue is; say “tenth”, then say it again, but this time hold the “n” sound and note where your tongue is), they will observe that the /n/ of “tenth” is in fact interdental, tongue sticking right out between the teeth.

  17. Peter Nelson says:

    Thanks Zrv. Kind of hard to deny the tongue sticking out between your teeth (and difficult to pronounce it any other way).

  18. Aaron says:


    Sorry, I missed your question until just now.

    I attend a fairly standard “second language for working adults”-style school, once a week, sometimes with Japanese classmates and sometimes one-on-one (when the classmates don’t show up). The teachers are mostly mainlanders, but there is at least one Taiwanese. The textbooks are all simplified. A standard session is about an hour, in which we go over a given unit in the textbook, read aloud, answer questions, go over new grammar points, and generally discuss the content. I’m not sure what else to say about it.

    One thing I can say is that I find most of my classmates to be severely handicapped by the Japanese language’s relatively limited repertoire of sounds; for instance pinyin syllables xi and shi will both become しー (Japanese shii with a hard “sh” like the English word “she”), chi will become ちー (chii), and zi will become ずー (). The differences aren’t apparent when comparing pinyin to Hepburn-romanized Japanese, but it would be clear if you heard it.

  19. Sima says:


    Many thanks for that. I just wonder whether your teachers and course books use only Hanyu Pinyin and Hanzi, or whether they also use some other notation.

    It would seem strange that distinctions of the xi – shi kind would cause so much more trouble for Japanese speakers than for English speakers, unless they were not being distinguished properly by the teacher or course book.

    English speakers begin with only [ʃ] and have to wrestle with the tongue to make the distinction between [ɕ] and [ʂ]. The associated vowels don’t seem to offer much of an advantage for English speakers either. Whilst English has a pair of ‘i’ vowels, these hardly help.

    On the subject of which, would anyone entertain the idea that the syllables, represented by Hanyu Pinyin, shi, zhi, chi, ri, si, zi, ci are without vowels?

  20. Peter Nelson says:


    It’s kind of up to your interpretation.

    Here one take from Wikipedia:

    [z̩], as the bare syllabic nucleus [z̩] [despite the transcription, not actually a syllabic fricative] after the alveolar sibilants /t͡s t͡sʰ s/
    [ʐ̩], as the bare syllabic nucleus [ʐ̩] after the retroflex sibilants /t͡ʂ t͡ʂʰ ʂ ʐ/

    And another (Thanks to Kellen for bringing this transcriptional issue to light)

    Sinologists have used ‹ɿ› to represent [z̩], a vowel which represents the i in hanzi (see Pinyin).

    I guess it depends on what you mean by vowel. Is a “vowel” a particular class of phonetic segment (produced without constriction of the vocal tract)? Or is it the nucleus of a syllable? Phonetically, it would seem the syllables “zhi chi shi ri zi ci si” lack vowels. Phonologically, they have vowels because they’re syllables, so we’ll just stretch our transcriptional tools to identify them.


  21. Minkyu says:

    Thank you a lot to everyone, especially Zrv and Sima who gave me a clear answer!

    To write some about initial /ㄱ,ㄷ,ㅂ/ (k,t,p), I’d say that almost every European language speakers take them as /k, t, p/. So when English speakers learn Korean, they have hard time to get rid of aspiration of those initial plosives. (affricates too) Hence, Koreans make fun of English speaker’s Korean depicting their pronunciation like */킴치/ [kʰim.tɕʰi]

    When Koreans accentuate the initial, or when they are singing, distinguishment between aspirated and unaspirated consonants gets much harder, as unaspirated consonants get much more breath too. Though, it has a slight difference (I don’t think I can suggest possible phonological differences), it can be distinguished by native speakers.

    Thank you Sinoglot!

  22. Claw says:

    Peter wrote:

    Phonologically, they have vowels because they’re syllables, so we’ll just stretch our transcriptional tools to identify them.

    Syllables don’t necessarily need to have vowels in the nucleus. The distinction between consonants and vowels is not as sharp of a distinction as many people think, and there’s actually a continuum of sonority between them (see: sonority scale). Various languages may end up placing what constitutes as valid syllable nuclei on different parts of the scale.

  23. Peter Nelson says:


    My point was that you could either identify vowels (phonetically) as segments produced with an unconstricted vocal tract, or you could identify them (phonologically) as valid syllable nuclei. Thanks for adding the information about the sonority scale.

  24. Aaron says:


    Re: notation, all Mandarin textbooks I’ve seen in Japan have used proper Hanyu Pinyin. Picture book dictionaries for tourists often supply very awkward katakana renderings for people with no knowledge of pinyin.

  25. Sima says:

    Many thanks for the question. I’m glad we were able to provide at least a partial answer. I’d still be interested to know whether there was a good reason for jiu, qiu, xiu being treated differently.

    @everyone else
    Many thanks for the informative comments.

