Pleco looking for etymology dictionary

You could hear the sobs of relief from long-suffering Android users around the world when Mike Love announced yesterday that the Pleco Chinese-English / English-Chinese dictionary software is out in beta.


I’m planning to install my version soon. But before I get all gushy* about Pleco’s virtues, I also want to point out a request that came in the announcement:

One area we’re still aggressively shopping for a dictionary is character etymology; if anybody has any suggestions for a good etymology dictionary (bilingual preferred, but even Chinese-only is helpful) we’d love to hear them. None of the online ones have been interested in working with us, sadly, so this would probably have to be something from a print publisher. We also very much appreciate suggestions for any other particularly-good dictionary that you think we should take a look at; the success of our OCR add-on has left us with quite a bit of of extra room in our budget, and we’d love to put some of that money towards acquiring more new dictionaries.

I’ve always been interested myself in getting a dictionary that had good, scholarly etymology work. If anyone has a good suggestion (or wants to suggest dictionaries to avoid) let us know in the comments. I can forward the conversation to Mike Love if we dig up anything.


*Gushy starts here. I almost didn’t get my Android phone last year because I like Pleco so much. I like the product and I like the way Mike Love runs the company. Although I think Hanping has done very well with the CE-DICT based app for Android, and I even purchased the full-featured version, when push comes to shove it’s still CE-DICT, which has lots of limitations.

I’m so excited about helping Pleco (for free, let the conflict-of-interest snoopers note) that I’ve broken my almost sacrosanct vow to avoid the computer except on Tuesdays. The avoidance has been a boon for my offline reading and writing endeavors and makes a good excuse for not having posted much of anything to Sinoglot or Beijing Sounds. But all abstinence, I say, must sooner or later be abstained from — lest the abstinence be impure — and what better occasion for indulgence than a Pleco Android beta!

25 thoughts on “Pleco looking for etymology dictionary

  1. I’d love to see some good resources for classical Chinese on Pleco!

    As for character etymology dictionaries, I guess It kind of depends what Mike means by “etymology” — the Shuowen is out there, but it’s not a reliable source. There’s the ABC Etymological Dictionary of Old Chinese, but it’s not going to be useful for people trying to learn characters. There’s Grammata Serica Recensa, but (a) I have no idea how to read Karlgren’s reconstructions, and I suspect that nobody does, not even him, and (b) as far as I know there’s no digital version of GSR. My print copy features photo reprints of handwritten characters. Then there are resources like but they have more to do with mnemonics than with actual scholarship.

  2. Not sure what to suggest for etymology, but this feels like a good place to express my biggest wish concerning Pleco, which is for them to license a good thesaurus or synonyms dictionary. In particular I would love to have an idiom thesaurus (分類成語辭典 or 同義成語辭典) to help me find those apt expressions that I know exist but can never recall when I need them.

  3. By “character etymology” Mike Love surely means the origin of characters, even though the all righteous Gods at Language Log have declared that we shouldn’t call it that way.

    A great idea, but I doubt it will work commercially because there is already an amazing free resource by language hero Richard Sears, which gives you all that most curious users of Pleco need, including the quotes from 说文解字. Pure gold. With apps for iOs and Android.

  4. Brendan, from the description, the ABC etymological dict is the kind of angle I’d be hoping for. To quote the publisher, “primary emphasis on the sounds and meanings of Sinitic roots.” As for mnemonics: all well and good until they take you down the slippery slope into the sludge of false etymologies, which they seem to do all too often in Chinese.

    HSY: Great idea, especially if they picked a quality source. I was just trying to remember some chengyu today at the appropriate place in a conversation. Naturally I couldn’t remember it and lost the opportunity…

    Julen: Man, I hope he doesn’t mean character origin. I agree with you though that the Sears stuff is way cool.

  5. Sears stuff is great, but would be better if it was within Pleco. Julen, I think Mr.Love mentioned the Chinese-Chinese dictionary they are licencing would have synonyms.

  6. The all righteous Gods at Language Log have a bloody good point, though. There’s a big difference between script and language. The few times I’ve gone looking for the etymology of Chinese words I’ve been sorely disappointed to find only breakdowns of the origin of the character, when it’s the origin of the word I’ve been after.

  7. Does anyone know any (digital) resources for Middle Chinese reconstructed pronunciations? I often want to know the MC tone of a character, and I find myself jumping through some bizarre hoops in order to deduce it (and 多音字 usually just cause me to give up).

  8. @Chris – yes, script is not the same as a language, but I don’t see how that is a “bloody good point”. It is just obvious for anyone wih a serious interest in linguistics. The way they obsessively insist on this makes me think they have axes to grind somewhere.

    Regarding etymology, sure they are right in the strict meaning of the word. But words have strict scientific meanings and also popular meanings, as it happens with many other scientific terms. And by all means there is nothing wrong with calling that character etymology, especially when important people in the field (with no academic tenure, oh dear!!) are using it this way. The “descriptivists” at LL should know better.

    One more thing: I am a reader of LL for years and I really admire the work they do. But frankly speaking their content about Chinese is well below the standards of the blog. And the reason why I am criticizing them here (you might ask) is because LL directly erase without notice any critical comment I write on their site. As a very active commentator in many different blogs, this is the first time I have such an experience.

  9. @Julen: Like some at LL, I have gone looking for the origins of a word to find only information on the origin and development of the character, and that labelled ‘etymology’. It would be nice to see a clearer distinction between ‘pure’ etymology, as in the origin of words, and ‘character etymology’, as in the origin of the characters. And just to be clear, I’ve got no problem with the term ‘character etymology’, and I’m certainly all for the study of the origins and development of characters and scripts. I just think the distinction needs to be kept clear so that we can all go out and find the information we want. And in that sense, regardless of the quality of LL’s writing on matters sinitic (and I, too, often find it disappointing) or any disputes you may have with them, the LL folks do have a bloody good point.

  10. Can we make a distinction between character etymology and word etymology?

    Much of the historical phonological work for the earliest periods of the Chinese language are based on patterns in xiesheng series and to a lesser extent tongjia patterns. Middle Chinese is based on a character rhyming dictionary. Old Chinese reconstructions are by and large a pushing back of Middle Chinese distinctions. My point, then, is that the characters are the starting point for Chinese phonological reconstruction. The only work I know of that is not character-based is that of Norman, Coblin and Simmons, which still have a long ways to go until anything comprehensive can be done, although the Old Chinese antecedents of Norman’s Proto-Min are starting to show up. I know that analogies with Tibeto-Burman are used sometimes, but often the methods employed are unscientific.

    As for semantics, that kind of work is done through Buddhist and Classical texts. (They are not the same linguistically.) Both of these sources are character-based.

    For a true word-based etymological dictionary to exist, a lot of comparative dialectology needs to be done, and unfortunately, that’s not happening, as the Qieyun system — a character based system — defines Chinese dialectology these days (excluding the scholars metioned above).

    Of course, by character etymology, we mean the development of a character from jiaguwen to its present form, then I understand the distinction.

  11. We can now make a very clear distinction between “character etymology” (perhaps better termed Chinese grammatology, i.e. 文字學) and “word etymology” in Chinese. Murat’s description of Old Chinese reconstruction is not inaccurate, but is somewhat outdated. Thanks to a number of breakthroughs over the last two decades, the morphological — i.e. word-formation — patterns of Old Chinese are coming into focus, which means it is possible to connect patterns of meaning relationship with patterns of sound relationship in order to reconstruct prefixes and suffixes with specific meanings and functions. We also know a lot more about connections of Chinese words to words in other languages and language families. As a result, it is possible to generate meaningful hypotheses about word origins that are not directly related to characters. Schuessler’s ABC etymological dictionary, mentioned earlier in this thread, is the first attempt to use this approach to systematically explore the word origins of Chinese words in an etymological dictionary. (Sagart’s book “The Roots of Old Chinese” from 1999 is another example of this approach, although it is not a dictionary.) While Schuessler’s dictionary is certainly a preliminary work, which will be revised, modified, and enhanced by future scholars, it is extremely valuable, and shows that Chinese etymology can be investigated in much the same way as etymology of other languages.

    As an example, consider the family of words shì 視 *giʔ ‘to look at’, shì 示 *gih ‘to show’, zhǐ 指 *kiʔ ‘to point; finger’, chén 臣 *gin ‘servant, one who watches’. The asterisked forms are Schuessler’s “Minimal Old Chinese (OCM)” reconstructions.

    According to Schuessler (page 467), shì 示 is the causative form (’cause to look at’) of shì 視, formed through addition of causative suffix *-h. chén 臣 is a nominalized form derived from shì 視 through suffix *-n (similar in function to English -er, ‘one who does something’).

    As an example of a different kind of word origin, Schuessler notes that chá 茶 *d-lâ ‘tea’ is likely a borrowing from Proto-Loloish *la ‘leaf, tea’ which in turn derives from Proto-Lolo-Burmese *s-la, which in turn is probably borrowed from Proto-Austroasiatic *sla ‘leaf’, as reflected in modern Zhuang la⁴ ‘tea’ (page 178).

    These etymologies provide explanations of word origin, and are independent of explanations of character origin.

  12. @HRV – Thanks for that interesting explanation, I look forward to the ABC etymology dict in Pleco.

    Just wanted to explain a bit the mess above. Chris, there is no dispute at all with anyone, first they would need to answer my comments for there to be a dispute. Second, I wouldn’t have a dispute in any case but a civil debate.

    Also, the problem is not really with the term CE, that is hardly worth fretting. What annoys me of the LL Chinese (and other American linguists) is that they seem to assume that the public is completely stupid. So, they assume we love characters because we think they ARE the language; they assume there is no proper etymology of Chinese because all the World is confused about the term “characters etymology”. So they spend years writing about how wrong everyone is , how bad the characters are, instead of actually doing the linguists job and developing proper etymology research.

    I know it might have been the case some decades ago, when de Francis wrote his famous books, that a serious demythification of characters was necessary. But I can’t understand why today they continue so obsessed, blaming the characters for all the problems in the World, as if they are bitter that they finally survived in the computer age.

    Yes, the characters make an absurd script, it is inefficient, it is terribly difficult and it is not really “necessary” for the Chinese language(s). In fact one could even argue that it is a deliberately complicated script, made that way by the educated elites to set high barriers to entry, therefore essentially undemocratic (I suspect this is what makes them so loathsome for excitable Americans).

    And yet, the Chinese people continue to use them, rather than use pinyin. And the descriptive linguist should study how they ARE used, and not do politics about how they SHOULD (or should not) be used. This is ultimately what I dislike of these linguists work, and this is what makes their posts about Chinese a World behind their excellent writing on other languages.


  13. Julen, thanks for the explanation. I understand where you’re coming from, and, especially with statements like:

    “that it is a deliberately complicated script, made that way by the educated elites to set high barriers to entry”


    “as if they are bitter that they finally survived in the computer age.”

    I suspect we are actually in complete agreement.

  14. I am also a LL reader, and I would say my least favorite part of LL is the cloak of descriptivism. We’re all prescriptivists deep down. As soon as you say anything is ungrammatical, you’re wandering into prescription. How many speakers need to make the mistake for it to count an ungrammatical and how many for it to count as language change? It’s a question that can’t answered except by appeal to ideals, that is by prescribing. Accordingly, because LL hides behind descriptivism, they can be pretty damn snooty about how terrible a thing it is to prescribe. Yes, the people who laugh at greengrocer’s quotes are snobs and deserve mockery themselves, but that doesn’t mean that we can just observe punctation and arrive at a perfectly objective description of what kind usage is correct or incorrect (or even “standard or non-standard”).

  15. Concerning LL on Chinese, almost all the blogging is done by Professor Mair. I have a lot of respect for his work, but he has a well known bugbear about characters, pinyin, and all the rest, and it can sometimes detract from the overall quality of the site to see one opinion alone propounded.

  16. I haven’t read about suffix -h. Is it based on Tibeto-Burman data?

    The motivation for reconstructing prefixes is still sometimes rooted in character dictionaries, though. For example: the 經典釋文 says to read transitive verbs with voiceless initials and intransitive verbs with voiced initials:



    So, according to (1), 會 in (2) should be read in MC with initial {k-} which probably goes back to Old Chinese *k- (or *q- ?). The intransitive form of 會 has MC initial ɣ-, which goes back to Old Chinese *g- ( or *G- ?) . These types of patterns have lead scholars to reconstruct OC prefixes:

    Mei Zulin *s-:
    會 (transitive) *s-g > MC {k-}
    – the prefix *s- derives transitive verbs and devoiced voiced initials.
    會 (intransitive) *g- > MC {ɣ-}

    Sagart/Baxter *N-:
    會 (transitive) *k > MC {k-}
    會 (intransitive) *N-k- > *g > MC {ɣ-}
    –Old Chinese prefix *N- derives intransitive verbs and voices voiced root initials.

    One other thing: I don’t believe a lot of the new reconstructions. Scholars are positing all sorts of affixes with overlapping functions on poor comparative evidence (Mei Tsulin’s reconstruction of *s- is a great example of using the comparative method incorrectly). Baxter/Sagart’s system now have Cə- formatives to account for Proro-Min softened initials. None of these things show up in the characters or lexicographic works, though. Comparatively, morphological forms in the logosyllabic writing system of Classical Mayan are obvious. The same is true with the early Egyptian writing system. If Old Chinese had abundant morphology — like those languages — then why don’t we see it? The fact that we don’t points to them not existing. So why are scholars so anxious to reconstruct OC morphology? I think David Branner has provided a good answer –

  17. @Murat

    That article was very interesting. Thanks.

    The one thing that struck me as odd about it was the last section, where he considered the pedagogical implications for teaching ancient Chinese. There he said we were faced with a choice of which reconstructions to give students, but it seemed like the assumption was being made that students would read the texts with a basically Mandarin pronunciation, only with some weird readings like the variant for 会. But that didn’t make entire sense to me. Why not have textbooks that so far as possible gave the old readings for characters with the ˀ and s and * and whatnot. I know it’s a bit impractical because most learners of ancient Chinese already know Mandarin, but that doesn’t seem very decisive to me. People who know Cantonese end up learning Mandarin and survive. People who know Italian learn Latin. Why can’t people who are interested in reading Confucius, et al. read those texts out loud in something like our best guesses to the original pronunciations? I think it would be great if there were textbooks where the characters for the classics all had ruby of the reconstructed pronunciations over top.

  18. @Carl —

    One problem is that Classical literature was not written with a single phonological system; so, we would need to master different systems for different periods. There is surely variance between contemporary texts, as well. Another problem is that our reconstruction systems aren’t good enough; for example, we really don’t know what the phonetic value behind the four grades 四等 are, let alone things like chongniu 重紐. There are Chinese people who say we should use Southern Min to read Classical literature. This is no better than any other southern dialect, though, since Min doesn’t make the same distinctions as Middle Chinese; for example:

    Middle Chinese
    知 ʈ- 莊 tʂ- 照 tɕ-

    Southern Min
    知 t- 莊 ts- 照 ts-

    Southern Min would be better for the Shijing, though:

    Old Chinese (Shijing)
    知 t- 莊 ts- 照 ts-

    There isn’t a “perfect” solution to the problem, then, so we might as well take the easiest way out and use Mandarin and occasional alternate character readings.

  19. As an example of a different kind of word origin, Schuessler notes that chá 茶 *d-lâ ‘tea』 is likely a borrowing from Proto-Loloish *la 『leaf, tea』 which in turn derives from Proto-Lolo-Burmese *s-la, which in turn is probably borrowed from Proto-Austroasiatic *sla 『leaf』, as reflected in modern Zhuang la⁴ ‘tea』 (page 178)

    So does Schuessler think that medial *-l- caused retroflexation and is the source of second grade characters? Or is he ignoring the fact that 茶 is MC 二等澄母?

    Here is how Baxter (2011) deals with words like 茶:

    秅 chá; MC: drae 二等澄母; OC: *[d]ˤra
    槎 chá; MC: dzrae 二等崇母; OC: *dzˤraj
    槎 chá; MC: dzraeX 二等崇母; OC: *dzˤrajʔ
    察 chá; MC: tsrheat 二等初母; OC: *tsʰˤret

    I’m curious to see if Zev Handel addressed 茶 in his book:

  20. Murat,

    >> I haven’t read about suffix -h. Is it based on Tibeto-Burman data?

    It’s actually a notational variant of *-s. (The explanation is complicated, and not really relevant to the general point I was making about word etymology.) Schuessler calls the suffix *-s/-h, and you can read more about it in his book pages 35-36 and 41ff.

    >> So does Schuessler think that medial *-l- caused retroflexation and is the source of second grade characters?

    The Baxter 2011 examples you give aren’t really relevant to the reconstruction of 茶, whose phonetic element suggests a lateral initial. That, and the comparative evidence, suggests some *l-like initial in Old Chinese. That is why Sagart 1999:188 reconstructs *lra (Type A).

    For Schuessler OC *-r- is the usual source of MC second-grade pronunciations, just as in other reconstruction systems, but in this particular form for ‘tea’ *-l- has a similar function. This is discussed on pages 88-89.

    In my opinion the evidence for morphological affixes in Chinese is becoming overwhelming, even though an understanding of their phonetics and semantics is still preliminary. They are indeed reflected in the writing system — we see them in terms of alternations within phonetic series — but not as individual elements in the same way that you see them in the writing systems of the typologically quite different Mayan and Egyptian languages. Figuring out how all this works in Chinese is one of the great challenges of the next several decades, and one of the reasons why Old Chinese reconstruction has become an exciting field again. The specific reconstructions of Sagart, Baxter & Sagart, and Schuessler for specific words may well turn out to be wrong. But they are interesting and valuable hypotheses that deserve further testing and consideration, as are the broader morphological and phonological hypotheses that underlie these individual reconstructions.

  21. They are indeed reflected in the writing system — we see them in terms of alternations within phonetic series

    This is really interesting. Do you have any sources?

    The Baxter 2011 examples you give aren’t really relevant to the reconstruction of 茶, whose phonetic element suggests a lateral initial. That, and the comparative evidence, suggests some *l-like initial in Old Chinese. That is why Sagart 1999:188 reconstructs *lra (Type A).

    Unfortunately, Baxter and Sagart (2011) didn’t provide their reconstruction for 茶, but the examples I provided are relevant in that they show the OC antecedents of MC forms with the same shape as 茶. I know that following Gong Huangcheng *-l- has been changed to *-r- in many cases. If *-l- is not changed to *-r-, then alternations between *-l- and *-r- in phonetic series are common. I haven’t heard of *-r- as the source of 2nd grade syllables being changed to *-l-, which is what Yakahtov (sp?) proposed in the 1960s, and Li Fang-guei changed to *-r- in the 1970s.

    I’ll check out Schuessler’s book the next time I’m in the library.


    That *-s brings up some interesting predictions. I’ll read up on it.