Missing Unicode characters

Time to play guess the character. It’s that one on the right, 毛 with 灬 below. I’ll give you a clue: It’s not Mandarin.

We run into this problem often enough on Phonemica. Someone wants to transcribe their dialect. An effort is made to do this with characters. We do this because we want to be able to link things later, for example looking at how 懸 cognates vary across the Sinitic languages in function and form. But then someone wants to write something for which there is no Unicode character, even though the actual character is widely used. Recently, the biggest one has been 亻厓 ngai, the first person singular pronoun of most Hakka dialects. To write it, you either have to do what I did and split the radical, or you have to come up with another one to replace it. Usually that’s 捱 for Hakka.

Today, this one came up:

That’s cua+. The + marks is as mid-level tone in the Taiwanese Hakka pinyin. It means “to carry” in the same way as 帶 in Mandarin. 亻厓 actually does exist in Unicode; It’s just not supported by most fonts, but that could change. cua+ is not so lucky.

Uncertain Etymologies

There’s this folk etymology about the Shanghainese 十三点 zəseti (an insult similar to 二百五) coming from English society. The story goes, during Shanghai’s period of foreign settlements and extraterritoriality, there were some people who would go to great lengths to dress in the Western fashions and socialise in those sorts of high society establishments. They were seen as fools by those not engaging in that behaviour, and the insult was born.

But many people disagree with this etymology. I haven’t heard a better explanation, but I often hear how this one is wrong.

I came across another similar situation today, prompted by an discussion from Dr. Mair.

阿斗仔 adoga is a common term in Taiwan to refer to foreigners. My sense is that there’s potential negativity to it, but it’s also used by my superiors to refer to me in my presence (in the third person, not to me directly). My teacher and I will be out around town and she’ll introduce me as “This adoga is blah blah”.

I found the following explanation of the origin of the term, which I find questionable at best:

阿斗仔源自於早期台灣被日本佔領, 形容外國人時,用日文腔調說英文”OurDoor”,



That is, it was a Japanese pronunciation of the English “our door” (out door?), and this became adoga, which now means “foreigner”.

Personally, I think “society” is way more likely and “our door”.

Nüshu in the Digital Age

I’ve signed up for a number of Google Scholar alerts which mostly give me links to old books on Google or things that aren’t really all that relevant, or both. Today this one came across my screen:

Protecting and Propagating Nu Shu with Information Technology

It’s good to see more being done with the technology that’s there for the scripts and languages that might soon not be. Nüshu is one of those things that a lot of people know about but what people know isn’t terribly much, or at least the data is hard to get to. This would be a good addition, assuming the end results are open and accessible (which, you probably know, is rarely the case).

Simplified,Traditional, 3rd Party candidates

It’s interesting to look back at what I cared about before and how those views continue to change. It’s also interesting to see how much people differ in what they value within the same general group of ideas.

In Shanghai the other day I saw another one of the previously ubiquitous billboards found in subways and airport terminals making note of how the simplified character 爱(愛) lost its 心, and how can you have love without heart, et cetera et cetera.

In the last year in Taiwan I’ve come to have a much more 随便 attitude toward character sets, and to characters in general. From the very start I’ve liked the moment of discovering a variant I’d never seen before, and I still like seeing different interpretations. A recent favourite is the half-traditional, half-simplified hand-written characters in YR Chao’s “A Grammar of Spoken Chinese”, or the variation used therein for 国. These have all been replaced with a Ming-Song typeface in more recent publications, but you can still find hardcover copies of the book with the hand-written glyphs.

Scroll down to the 4th scanned page on this post to see examples of both the hand-written forms and the variation on 国. My own copy being newer, I don’t have the enjoyment of enjoying Chao’s own writing habits.

There’s the argument that traditional characters preserve the culture and simplified are just one more instance of governments being bothersome. There’s the argument that simplified help in literacy and tradition has other outlets anyway. There are dozens of arguments in between.

I think part of my own view is tempered by having had to operate in both environments that use simplified (anywhere in the PRC, some textbooks in Taiwan) as well as those that use traditional (Taiwan, China, grad school wherever). So from that I say: If you’re serious about the language, just learn both; It’s really not that bad.

But I think the other part of my view is really coming from all this time working on Phonemica and the countless times we’ve spoken to journalists as well as volunteers. “Are you guys trying to protect (保护) the dialects?”, the question goes. “No,” we say, “not 保护. We’re not trying to stop the flow of Mandarinisation. We’re preserving (保存) them and the stories of their speakers”. Because even if we wanted to stop Mandarin (we don’t. we rely on it as lingua franca for our daily lives just like everyone else), we can’t. You can prevent languages from changing, from dying out, from splitting into other languages, from blurring borders between neighbours.

So yeah, I think if you’re serious about Chinese, you should learn both character sets. But then in addition to that, I think people would do well to understand that language change is a natural part of societies and that variant characters, hand-written short forms and all the other things that bother traditionalists are all going to happen anyway. Simplified characters are no less “real Chinese” than the modern metropolises are “real China”.

卍 in Personal Names

The above picture shows part of an article from the Taipei Times. The full article is available online here.

The relevant part:

Taiwan Association for Victims of Occupational Injuries representative Ho Kuang-wan (賀光卍) expressed skepticism over its effectiveness as a deterrent.

Anyone who knows me know’s I’m a fan of obscure characters. I get downright giddy when I see a variation on a character that I’ve never seen before. Equally cool is a character used in a name that isn’t usually. In fact, this may be the first time I’ve seen 卍 in the wild. I brought it up to Steve, and he can’t recall seeing it like this either.

Early on, I’d always thought 万, the simplified form of 萬, looked suspiciously like 卐 (also written 卍), which shares the same pronunciation. They are in fact variants. A quick look at ChineseEtymology.org brings up a seal script form matching 卍 (L22753).

From the Wikipedia article:

The paired swastika symbols are included, at least since the Liao Dynasty (AD907–1125) , as part of the Chinese writing system (卍 and 卐) and are variant characters for 萬 or 万 (wàn in Mandarin, man in Korean, Cantonese and Japanese, vạn in Vietnamese) meaning “all” or “eternity” (lit. myriad). The swastika marks the beginning of many Buddhist scriptures. In East Asian countries, the left-facing character is often used as symbol for Buddhism and marks the site of a Buddhist temple on maps.

That’s about it. Characters are fun.

Thanks to Anne for sending me this photo.

“China’s tower of babel” and the language / dialect question. Again.

China Realtime Report put up a good piece about Phonemica. I thought the title not bad: “Getting China’s Tower of Babel on Record“.

A lot of you might see the article anyway, but I doubt you’d make it over to the comments, and the very first comment is probably one that Sinoglot readers and writers alike have spent way too much time thinking about:

What is the distinction between a language and a dialect?

Since I thought Kellen’s response gave a pretty nice simple summary, and since I know he’d be too shy to repost it himself, I thought I’d give it its very own Sinoglot post:

The real answer is that there is no answer. The distinction is arbitrary and can be motivated by a number of different factors; It can be political, historical, sociological, or just based on convenience. For example High German, Low German and Dutch form a continuum where a speaker from one end can’t understand a speaker from the other end if each is speaking their own hometown dialects, but speakers from any two neighbouring towns will have little trouble in communicating. China is made up of a number of such continuums, Mandarin being one, Cantonese another, Wu a third. For the project we treat Cantonese as a language and Mandarin as another language, but with a distant common ancestor, the same as Italian and Spanish are related through Latin. This is the reason we tend to group the entirety of our focus in the project under “Sinitic”, referring to any modern language variety that is descended from Old Chinese. The shared relationship of these language varieties is known, and the appropriateness of different degrees of fineness in distinctions between them is different for different situations.


Win Sinoglot’s first-ever romanization prize!

What are we going to call it: the Sinoglot Sinitic Specialist Award?

The executive committee is still debating the exact value of the prize, but the contest is simple enough:

  1. Consider the following two lines of romanization* from a recording of a someone speaking a Sinitic language
    • A,aqna.Geqkeq cio Cinsaenonminro,Geqthe raeyeugnin rokeq va?
    • Naha la?Zyyau mentie ha la?Menmenkoe ya,Geqlaonshian raeyeugnin rokeq va?Dakae ze mmeqleq.
  2. On your honor, without peeking at the recording on Phonemica, be the first to name
    1. The Sinitic language being transcribed
    2. The type of romanization being used and a bit of its history

*Not guaranteed to be error-free, as Phonemica is crowd-sourced and editable by anyone!

Sexagenary Cycle

In the last two posts, I listed the ten Heavenly Stems and the twelve Earthly Branches. I think that knowledge of these things is useful. And it’s just not very difficult. Of course, if you plan to take much of an interest in Chinese history, it could be essential.

Ten times twelve equals sixty

For much of Chinese history, years were listed by a combination of the Stems and Branches. This gives us sixty years in a cycle, not a hundred and twenty, as will become clear. If sixty years doesn’t seem very helpful, remember that only the Qianlong Emperor lasted a full cycle on the throne. With a little knowledge of Chinese history, you can pinpoint all kinds of dates, or at  least work out how many years passed between certain dates within a limited span. Continue reading

Earthly Branches

In the last post, I introduced the Heavenly Stems. Now for the Earthly Branches. That’s right; there are two sets of these damned things. If you recall, there were 10 Heavenly Stems; there are 12 Earthly Branches.

Interesting number, twelve. Twelve months in the year, twelve hours in the day, twelve disciples, twelve signs of the zodiac, twelve animal years, twelve inches in a foot, twelve pence in a shilling…er have I forgotten any?


How come twelve is so important? Continue reading

Heavenly Stems


The ten Heavenly Stems, sometimes called the Celestial Stems, date back to the very earliest records of writing in China. These characters appear to be very old indeed and seem to have marked the days of a ten day week. The characters themselves are not terribly common in modern Chinese, with only one appearing in the first 1000 characters, and five more appearing in the 3000 most common characters.1 Continue reading