UPDATE: Thanks to ahbin in the comments, I’ve added “Ping Speech” (平话) as a dialect of Cantonese. Its omission was an oversight for which I owe maybe 2 million people an apology. That brings us to 50 “sub-fangyan” (次方言) under the original 7 fangyan. This makes the title of my post obsolete, but what the heck. The sub-fangyan vs fangyan decision follows the basic scheme of the Chinese Language Atlas (中国语言地图集)


How should Chinese be categorized, linguistically?

Fundamentals of Chinese Dialect Studies (《汉语方言学基础教程》:李小凡,项梦冰) describes how in the first half of the 20th century, the proposed divisions of Chinese increased from four in early scholarship, up to eleven in one scheme. Now most scholars are back down to seven or eight. But between language change and debates about definition, it’s a question that guarantees academic employment for years to come.

7/49 is the plan I’ve just posted on the Phonemica blog. I’m pasting the chart below.

For Phonemica, in a certain way, it doesn’t matter. Since the goal there is to collect recordings from speakers of every variety of Chinese, you can take a strictly empirical approach to judging whether recording A is “the same dialect” as recording B. Heck, if “mutual intelligibility” is your criterion, we’ve already got one recording that is nominally Mandarin (Lower Yangtze Mandarin) but seems to stump most fluent speakers of putonghua.

Still, we need an organizational scheme as a starting point, and for now, this is it. There’s plenty to debate, from high-level categories to mere names. I’m looking forward to your thoughts.


方言 次方言 Language dialect
粤语 钦廉 Cantonese Qīnlián
吴化 Wúhuà
勾漏 Gōulòu
高阳 Gāoyáng
邕浔 Yōngxún
广府 Guǎngfǔ
四邑 Sìyì
平话 Pínghuà
湘语 吉首 Xiang Jíshǒu
娄底 Lóudǐ
长沙 Chángshā
洞绥片 Gan Dòngsuí
怀岳片 Huáiyuè
宜浏片 Yíliú
吉茶片 Jíchá
抚广片 Fǔguǎng
鹰弋片 Yīngyì
大通片 Dàtōng
昌靖片 Chāngjìng
客家话 宁龙片 Hakka Nìnglóng
粤台片 Yuètái
铜鼓片 Tónggǔ
粤中片 Yuèzhōng
惠州片 Huìzhōu
粤北片 Yuèběi
汀州片 Tīngzhōu
于桂片 Yúguì
官话 冀鲁官话 Mandarin Jìlǔ
东北官话 Northeastern
胶辽官话 Jiāoliáo
中原官话 Central Plains
兰银官话 Lányín
西南官话 Southwestern
江淮官话 Lower Yangtze
晋陕官话 Jìnshǎn
北京 Beijing
普通话 Standard
吴语 处衢片 Wu Chùqú
婺州片 Wùzhōu
宣州片 Xuānzhōu
瓯江片 Ōujiāng
台州片 Táizhōu
太湖片 Tàihú
闽南 Min Mǐnnán
莆仙 Púxiān
闽东 Mǐndōng
闽北 Mǐnběi
闽中 Mǐnzhōng
琼文 Qióngwén
邵将 Shàojiāng





  1. Carl says:


    “普通話, Standard” should be in simplified characters to match the rest of the table.

  2. Peter says:

    I think the more pressing question is “What’s the standard procedure for deciding how to classify a person’s speech?”. For instance, if I asked my friend what 方言 she spoke, she’d say 昆山话. Now, I happen to know and be quite confident that that belongs to 吴语-太湖片, but if you were to give me some random 话, I’m not sure I could do better than looking at a dialect map and guessing.

    • Steve (Syz) says:

      Very pressing question! The approach you describe — plotting on a dialect map — is actually the one I’ve taken so far, but even with the limited number of recordings up on Phonemica at this point, I’ve already run into trouble. With this recording, for example, I was debating whether it should be classified as 晋语 or not. The resolution of the dialect maps I have available leaves something to be desired, and anyway this one is in a border area. I ended up asking a friend of mine who’s a professor of fangyan studies at Beida. He said he heard entering tones (ie syllable-final p, t or k stops) and on that basis would classify it as 晋陕官话 rather than another sub-fangyan.

      That’s probably the general approach to the question: use geography for the clearcut cases, then compare more detailed linguistic features to sort out the borderline cases. What exactly those distinguishing features are is a question I hope we get to explore a lot on Phonemica.

      • Kellen says:

        That’s just it though. Guessing on a map is what you do if you don’t know the dialects like the guys who’ve been studying it their whole lives. But really, if you want to know what you’re listening to, you have to have the experience of these guys. I can do marginally well for Northern Wu and hear things that put it in different areas, but I can’t get down to village on my own, as much as I wish I could. You have to listen for and be able to identify a whole lot of small details to really know.

  3. Alex says:

    Min is my particular area, not of expertise but of interest and some experience, and based on that I have to take serious issue with all the various Min topolects being classified as a “language”. Looking at the “Mandarin” (Northern) dialects, a Beijing dialect speaker can understand a Chengdu SW speaker with some difficulty, but after a short period of adjustment there won’t be any issues. Ditto within the Cantonese family – people from Guangzhou and Siyi can talk to one another. This is not even remotely the case within Min. I speak reasonably good Taiwanese Minnan, and I can understand absolutely nothing out of the mouth of someone from Fuzhou or the Puxian area. I can understand some of what people from Hainan or Chaozhou are saying, but that’s because those are dialects of Minnan. If you want to test this, find someone from Fuzhou and someone from Xiamen and ask them each to count to ten.

    So I would argue, strongly, that each of the Min “dialects” you’ve identified need to be pushed up a level. They are each dialect families, or languages in your terminology, in their own rights, and contain dialectical differences akin to those within the Northern dialects or Wu or whatever.

    • Kellen says:

      Hi Alex. I’d be interested to hear what Zrv has to say about this, and ultimately I’m going to agree with whatever that may be. However in the meantime, I’d like to point out that the Northern Wu dialects aren’t all mutually intelligible and that someone from one end of the spectrum would have a fair amount of difficulty even bothering to try to understand someone from the other. I grew up in the US and while I can probably understand a slightly wider range of English dialects than other people I know, I’m sure there are a handful of native speakers I’d be hard pressed to understand. I think there’s probably more to the classification than if one can understand another. If not, then why wouldn’t you call Spanish and Italian dialects of the same language? I’m sure after some time in shared space they could get by.

      • Alex says:

        Splitting up dialects and languages is notoriously subjective. The way you have decided to cut it up for the other ones is fine, but what I’m saying is that the difference between Fuzhou speech and Xiamen speech is approximately as great as that between Cantonese and Mandarin, which in the Romance language context is indeed parallel to Italian and Spanish. Mandarin speakers stuck in HK can eventually figure out how to be understood, more or less, in Cantonese, and I’m sure you’ve heard how people from HK speak Mandarin when they’ve had no formal education in it but a couple of years on the Mainland. There’s a real difference between that kind of relation and the level of similarity that most classification systems need to call two speeches a dialect. Even though Jamaican English is super hard for Americans to understand, no one would argue that it would take four years of living in Jamaica for you to understand it. Most, possibly all, Chinese dialectology splits up the world accordingly. Take a look at Languages of China by Ramsay (I think) and the Chinese book that it’s based on, which is called something like 汉语方言学理论. It puts the various Min languages at the same level as Cantonese and Mandarin.

        I recognize that I’m being annoying and that for your purposes there’s probably no point to this – you are unlikely to get recordings of Puxian speakers for this site. Sorry about that. Hope you can get Taiwanese / Xiamen / Quanzhou / Zhangzhou on there though. There are interesting differences between Quanzhou Minnan and Zhangzhou Minnan that persist in Taiwan, depending on where the original settlers in a given area were from.

        • Ty Lim says:

          I am a speaker of Teochew (Chaozhounese) and I have to agree with Alex that the Minnan “dialects” are indeed very different from the other Min languages. Minnan, Minbei, Mindong, etc. are more like “regionalects”? Then again I have only heard spoken Fuzhounese and perhaps the real test would be to hear examples from all the Min groups and do a comparison. In anycase, cheers for getting Phonemica going – I hope to see more non-Mandarin language/dialects represented.

        • Steve (Syz) says:

          Thanks for all this. It’s exactly the sort of discussion I hope to keep having through Phonemica.

          It’s not “annoying” at all, and actually we do eventually hope to have representative recordings from all over. Don’t rule out Puxian speakers yet!

          Keep in mind, though, that neither Kellen nor I are experts in the division of fangyan. Therefore we have to rely on others who are to provide us with this starting point. This is not our own categorization scheme! In this case we’re relying specifically on the book mentioned above and the advice of one of the editors of that book. I realize that there are plenty of good scholars who disagree with it, but there’s hardly consensus along the lines that you suggest, “Most, possibly all, Chinese dialectology splits up the world accordingly.”

          So this scheme is a starting point.

          I’ll be happy to change it — or offer alternative schemes — down the road if we can gather enough empirical data. Once we get to the point of having thousands of users and hundreds of recordings, the possibilities open up. For example, we could organize a rating system for intelligibility, asking users whose native language is X how difficult they find it to understand recordings Y and Z. There’s a lot of devil in the details, but it’s the kind of survey that would come much closer to the intuitive idea of “intelligibility” than many more technical analyses.

          On the technical side, with enough recordings in place, we could theoretically compare lexicons and sound changes — but that starts to get closer to academic work that’s already being done for many fangyan, and probably done at a greater level of detail than we will be able to attempt on Phonemica.

          One other thing, just to expand on what Kellen said: if it makes you feel any better, these categories don’t seem very consistent with the “intellibility” criterion in non-Min areas either. Sure, Northeast Mandarin is pretty close to Beijing’s Mandarin. But Lower Yangtze strikes most “Mandarin” speakers as a completely different language. This may well be true of subfangyan under other fangyan as well.

          • Zrv says:

            The Min “dialect group” definitely shows more internal variation than any other of the traditional seven “dialect groups”, as Alex notes, and one could quite justifiably argue that it is made up of four to six “dialects” at roughly the same level of internal cohesiveness as Wu or Yue, based on mutual intelligibility and typological criteria. That being said, there are good reasons historically for arguing that they form a taxonomic whole — i.e. come from a common “Proto-Min” source — so that from a historical-phylogenetic perspective all of Min should be at the same level as Wu, Yue, etc.

            This just points up how devilishly difficult it is to come up with a consistent scheme for classification. First you have to agree on what your criteria for classification are: historical origin? geographic cohesiveness? mutual intelligibility? typological similarity? cultural connectedness? All of these criteria have played a role in expert and lay views of Chinese dialect groupings, and they can yield quite contradictory results. The language of Hangzhou, for example, is genetically Mandarin but (broadly speaking) typologically and geographically Wu.

            And even once you agree on the criteria to use, we still don’t have enough data and analysis to apply the criteria to reliably draw the boundary lines and determine the number of dialect groups, dialects, and sub-dialects.

            For the purpose at hand — putting a useful and meaningful label on dialects collected for the Phonemica project — it seems to me that a somewhat tweaked “traditional” classification scheme is fine. It’s a first approximation, and can always be revised later if/when the linguistic community gets closer to consensus. So I think Kellen and Syz can stand by this scheme.

          • Steve (Syz) says:

            Zrv, thanks for this. Yes, it’s easy to forget (and I do it myself) that “mutual intelligibility” is only one of many possible criteria to use in language classification.

            It just occurred to me that there’s no theoretical reason we couldn’t use multiple classification schemes. Phonemica users could select their own scheme! (Ok, that’s mostly a joke — Kellen would kill me over the implementation complexities)

  4. Karan says:

    Which Cantonese does 台山话 fall under? Just curious.

  5. ahbin says:

    What about the Ping 平 languages of Northern Guangxi (pinghua 平話)? They seem to be missing altogether from your list. They have similarities in vocabulary and phonology to some southern Xiang Dialects, but I thought they had been made a separate class a long time ago. None of them seem to be included in the Cantonese category you mention either. There is one where 東風 is pronounced “ne me” and bamboo is “lia”, it doesn’t sound very Sinitic at all!

    Some of the varieties of Hakka are the same as what Alex described for Min, I think. There are out-of-the-way varieties that have no entering tones and are incomprehensible to Hakka speakers in Taiwan.

    • Kellen says:

      Ping is often classified with but not belonging to Yue. It’s absence is an oversight I’m sure.

    • Steve (Syz) says:

      As Kellen said, 平话 was a big oversight. Much thanks, ahbin, for pointing it out.

      I’ve added a note at the top of the article. Following our original model from the Chinese Language Atlas 《中国语言地图集》, I’ve added Ping to the list as a sub-fangyan of Yue/Cantonese. I realize plenty of people are going to be unhappy with that, but it’s really just a starting point. There’s been an academic debate for a number of years about Ping’s status and classification and there’s research going on right now. If I get around to it, I’ll post the pertinent pages from the textbook I’ve been using: 《汉语方言学基础教程》

