Pinghua romanization

Traveling in Guangxi, digging a little bit into exotic* Binyang Dialect while taking in the scenery (Toto, I have a feeling we’re not in Beijing anymore). I’ve done some cursory searching online but failed to find a romanization for Binyanghua, so I thought it would be fun to try making one myself and put the recordings up for the listening pleasure of anyone else who loves a good voiceless alveolar lateral fricative aka ‘voiceless el’ [ɬ]. Who knew a Sinitic language would have consonant phonology in common with frickin’ [forgive the pun] Welsh!

The goal is utilitarian: I’d like to be able to hear a word and write it down with confidence that I’ve got the basic phonemes right, including the phonemic tones.

How does one do a romanization? Unfortunately I have no academic background here, but I believe the following would be classified as the Empirical Brute Force method. Can’t say how well it’s gonna work, but least it’s a starting point. Got ideas about how to proceed with analysis, samples you’d like to hear, or references I could look into? I’d love feedback!

BYHR = my attempt at a BinYangHua Romanization.

WARNING: This post is just a starting point, and what follows in the numbered sections is more or less a chronological exploration. The BYHR in the first sections is full of inaccuracies and inconsistencies. As I work my way through subsequent sections, I’m revising my hypotheses about what sounds and tones are phonemic. If you want to be boring and skip all the hemming and hawing, you can go to the end of the post to read the running hypotheses. I will try to follow up with future posts, but my time is short and it’s better not to make promises when your previous post was, oh, about two years ago.

Sample 1: “I’m drinking water”

“I’m drinking water” 我正在喝水
OK, this sounds straightforward enough. Not really that far off Mandarin. I’ll try breaking it down syllable by syllable.

Mandarin recording BYHR Notes
wei22 For the record, at this point I’m just kind of winging it on tones, using tone numerals 1-5 where 1 is lowest pitch and 5 is highest.
 sei33 Is it possible there’s a glottal stop at the end here? BTW, pretty sure there’s no s/sh distinction as we have in standard Mandarin, so just using /s/



Sample 2: “Drink it down in one gulp”

“Drink it down in one gulp” 一口喝下去
Yikes. This is sounding a bit more exotic now. Is that something like /bl/ in the fourth syllable? It’s harder to divide the syllables this time, but here’s my attempt.

Mandarin recording BYHR Notes
yet33 It sounds to me like there’s a stop at the end of this syllable that seems to get sort of assimilated into the /h/ of the next syllable. Similar to the assimilation of the ‘t’ in /hat sei/ above.
hou33 /ou/ not quite the same sound as /ou/ in Mandarin, but I’ll ignore that for now.
hep33 Hmm. I’m writing /p/ at the end of this syllable even though it sounds voiced. Guessing it’s just an assimilation because of the /l/ in the next syllable.
leok11 Not a vowel sound I’m familiar with. From other conversations I gather there’s a /k/ stop at the end of the syllable, although it’s not very noticeable here. And the associated hanzi would be 落 instead of 下 as in Mandarin.




Sample 3: One thru Twenty

As long as we seem to have a number (yet = 1) in the sentence above, let’s try listening to a bunch of numbers.


Mandarin recording BYHR Notes
ni52 the /n/ sounds more like [ŋ] than [n], but if there’s no phonemic significance, I’ll just write it as /n/.
hlam24 There’s our first voiceless alveolar lateral fricative [ɬ]. To keep it simple, I’ll try just using ‘hl’ unless the notation ends up looking problematic.
nou21 Again sounds [ŋ]. To my ear, the tone here sounds consciously descending, but low-descending rather than the high-descending of ni52 above
十一  sep11yet33 Phonemes seem right. Tone of yet is definitely higher than sep, but not super 55 high as marked above.
十二 sep22ni31
十三  sep11hlam13
十四 sep11hlei33
十五  sep22nou21  /p/ seems to get assimilated. Nou is still definitely descending, which might require then that sep start a little higher.
十六  sep22lok11 Again not sure of tones. Lok is lower, but it sounds to me like it doesn’t descend in the way that nou does.
十七 sep11cet33
十八 sep11bat22
十九 sep11jiou22
二十  ni42sep11



Interesting. Before starting this numbers exercise I hadn’t thought about how useful numbers might be for understanding the tonal system. I’ll come back to this.


Sample 4: “Make / wrap zongzi”

“Make / wrap zongzi” 包粽子 [supposedly aka “sticky rice dumplings” — good stuff]


Mandarin recording BYHR Notes
 beo13 Is that the same /eo/ sound as in leok above?
zon44 First encounter so far with /n/ at the end of the word. As noted above with ni, it sounds more like [ŋ] than [n], but if there’s no phonemic significance, ‘n’ should be good enough.



Sample 5: “Binyang’s ‘fire cracker dragons’ are really famous”

“Binyang’s ‘fire cracker dragons’ are really famous” 宾阳炮龙很有名 [And pretty cool too. Do an images search for 炮龙]


Mandarin recording BYHR Notes
宾阳 ben44yein11 First /ein/ we’ve had.
炮龙 peo44lon11
很有名 hen44you22mek11 Honestly I don’t hear that /k/ at the end. But my informant assures me it’s there. Maybe it’s interference from Mandarin, maybe it’s just a weird recording.



Sample 6. Days of week


Mandarin recording BYHR Notes
星期一  hlen33gei11yet55 Actually, the first time I went through these I misheard hlen as ‘sen’ — such is the influence of one’s dominant phonemic system.
星期二 hlen33gei11ni42
星期三 hlen33gei11hlam24
星期四  hlen33gei11hlei55
星期五  hlen33gei11nou21
星期六  hlen33gei11lok21 Can’t really figure out the tone on 6. Sounds like 21 in this case, but previously sounded more like 11.
星期日 hlen33gei11net21 Gonna need more samples of 日 to feel confident about that net. Is the [ŋ] at the beginning doing something funny to the vowel, or is it not the same as /et/ in previous words? Sounds like Russian Nyet to me.




 Sample 7. “Today is Sunday”

“Today is Sunday” 今天是星期日


Mandarin recording BYHR Notes
 今天  gam33net21 So apparently this is 今日 rather than 今天.
星期日 hlen33gei11net21




 Sample 8. “My hometown is Binyang”

“My hometown is Binyang” 我老家在宾阳


Mandarin recording BYHR Notes
 我 nou44 or 55?
leo42 Tones are hard to pin down — falling, anyway.
za24 Sounds a bit like a /t/ stop at the end, but that’s just the influence from the next word, zai.
宾阳 ben33yein11 Yein sounds sort of creaky voice like a good solid 3rd tone in Mandarin, no? Not sure if 11 is the right description…



Sample 9: “I’m playing guitar”

“I’m playing guitar” 我正在弹吉他

Mandarin recording BYHR Notes
? Clearly this doesn’t sound the same as 我 above. I even asked about it, but my BYH speaker says it’s insignificant. Just one of the vagaries of speech production — just gonna let it slide.
dan212 Tell me that voice isn’t bottoming out! Even lower than the previous word, zai. Significance TBD.


Mostly I liked this sentence because of how “guitar”ish 吉他 sounds compared to Mandarin!


Sample 10: “Thank you, Teacher”

“Thank you, Teacher” 谢谢老师


Mandarin recording BYHR Notes
 谢谢 sie42sie42 This is the first documented /ie/. Maybe it should be just /i/?
老师 leo11sai24




Working hypotheses

For initials, it seems like we’ve got the following so far and I’m pretty sure there are more. In the Examples column I’m including the tone markings just so you can do a Find in the browser and get to the relevant sample.

initial examples
/b/ bat33, beo13, ben44 (33)
/c/ cet33
/d/ dan212
/g/ gei11, gam33, get44
/h/ hou33, hep33, hu44, hen44
/hl/ hlam24, hlei44(55), hlen33
/ji/ jiou22 [different from /zou/?]
/l/ lok11(21), leok11, lon11, leo42(11)
/m/ mek11
/n/ nou21(44), ni52(42), net21
/p/ peo44
/s/ sep11(22), sei11, sie42, sai24
/t/ ta24
/y/ yet55, yein11, you22
/z/ zon44, zei22, za24, zai11(22), zen44


OK, now the same for finals

final examples
/a/ ta24
/ai/ zai11(22), sai24
/am/ hlam24(13), gam33
/an/ dan212
/at/ hat44, bat33(22)
/ei/ wei22, sei33(11), hlei44(55,33), zei22, gei11
/ein/ yein11
/ek/ mek11
/en/ zen55(44), ben44, hen44, hlen33
/eo/ beo13, peo44, leo11
/eok/ leok11
/ep/ hep33, sep11(22)
/et/ yet33(55), cet33, net21, get44
/i/ ni52(31,42)
/ie/ sie42
/ok/ lok11(21)
/on/ zon44, lon11
/ou/ hou33, nou21, jiou22, you22
/u/ hu44


What about tones? There’s really not enough data yet. My hunches are like this

phonemic category Best examples in this category Notes
Flat high zen44, hat44, hu44, yet55, hlei44, get44 I suspect this will ultimately include all the 33 examples too. Note for example that the number 1, yet, shows up as 55 but also as 33.
Flat low zai22(11), leok11, lok11, sep11 Probably all the 22s belong here. I’m a little confused about whether there might be an even lower tone of some sort — see note about dan212 below.
Rising hlam24(13), beo13, na24, sai24
Falling high ni52(42), sie42
Falling low nou21, net21

Also, possible tone sandhi: two flat-high tones next to each other, the second one is slightly lower, e.g. hat44sei33

Stuff I’m confused about…

I’ve got it 3x in the samples above: wei22, nou44, and [?].
dan212 Not sure if this is a super-low tone or if it’s just another version of flat-low as I’ve got above.


*宾阳话 is a subset of the top level Sinitic fangyan group Pinghua, which is to say Pinghua is parallel to Mandarin, Yue (Cantonese), etc. To paraphrase Wikipedia’s Pinghua entry and Baidu Baike’s 宾阳话 entry, in the past Pinghua was classified as part of Yue, but it was split off in the 1980s. It qualifies as exotic cuz there aren’t many speakers, as Sinitic languages go: total around 2m for Pinghua and 800,000ish for Binyanghua. It counts among its speakers both Han and Zhuang, and there seems to be some serious ethnic mixing according to one genetic study I came across.

Written characters’ influence on speech

It comes up more often than you might think in conversations with people about linguistics, and what linguistics is. What it is not is grammar pedantry, especially when it comes to phrases like “I should of” or “you’re book is over there”. Those aren’t even grammatical issues; they’re entirely orthographical.

Of course there are possible isntances where the written word does influence the way we speak, and there are a number of readily available cases of this in Mandarin. Today I ran into my nemesis of spelling pronunciations: 秘鲁 Perú. The traditional pronunciation in Mandarin is bìlǔ, but since 秘 is 破音字, many people use the more common pronunciation for 秘 and pronounce the country name as mìlǔ. I’ve even met native speaking Mandarin teachers who were unaware of a bìlǔ pronunciation.

The other case that comes to mind is the word for “network”. In China, it’s wǎngluò and all is well. The characters are 网络 and no one writes otherwise (as far as I’ve seen). Meanwhile here in Taiwan, we don’t every say 网络. Well, I do, but I get corrected. Here the common word is 网路 wǎnglù, and I can’t recall a single time I’ve ever heard otherwise, and it’s a word I pay extra attention to, being one that I learned “incorrectly” in China. I can’t help but think there’s something orthographical behind this change.

Were I someone with a lot more free time I’d look into it.

Yinzhang, Hanko, Inkan, Chops

I’m a fan of stamps. A few years back I too up carving seals as a hobby. Recently on a trip to Japan, my number 2 most important thing to buy was a series of the more casually used Japanese seals, called either hanko or inkan (pictured).

In Taiwan (like Japan), personal seals are almost a necessity. When I bought my motorcycle I needed my stamp in order to transfer the deed. You need them to rent an apartment or open a bank account or to do almost any other major financial transaction. For all intents and purposes it’s your signature. Their use is declining, most severely in Korea it seems, though you can still find seal-carving stands all over Seoul. Their use in Korea only goes back to the early 1900s as a policy put in place by Japan. Still, quite a few people use them.

Currently, 32.8 million Koreans, accounting for 66.5 percent of the population, have registered seal impressions. A total of 48.4 million certificates, worth of about 29 billion won ($23.4 million), were issued last year. (from Joonan Daily)

You can do fine in Korea without one, to be sure. In Taiwan though, without a seal you’re going to be leaving a lot of thumbprints.

Jeju道 and Jeju岛

I’ve got this idea that I will obsess about for short periods of time, and have done so for years. It goes back to my past life as a philosophy student studying the concept of meaning (see Nelson Goodman’s book “The Languages of Art” to get an idea), taking classes with names like “The Meaning of Meaning”.

The basic focus of this obsession is about utterances and how they are intended versus how they are understood, but narrowed down to the level of phonetic ambiguity. I hear a statement, we’ll call it X, which has meaning and pronunciation similar to a different statement Y. You heard Y. What was actually said? In reality the result will be the same, so it really doesn’t matter. Communication was accomplished, and the two parties may never realise there was a minor misunderstanding.

The recent example that brought this back up was in talking to someone about Korea. Just south of the peninsula there is a large island called Jeju, which is also a province. The word for island (岛, 도) sounds the same as the word for province (道, 도). So you hear “제주도” (je ju do), but which was actually said? Is meaning up to the creator or the audience? This is an unending debate. There are cases where context will make it plainly obvious, but not always.

The Brief History of Lánqīng Guānhuà in Republican China

In February 1913 a subordinate committee of the Ministry of Education was established. Called the Committee for the Unification of Pronunciation, they were tasked first and foremost with determining the pronunciation for all words in the national standard language (國語). As part of that function, they were also to determine the number of phonemes used in the language and to then adopt an appropriate phonetic alphabet.

The committee had 45 members, selected to represent the various provinces and districts in China and thus represent the substantial linguistic variety of China. They were also tasked with determining just which Sinitic language was to become the national standard. Dashan has already addressed the chances Cantonese had at this position, so I’ll skip that part. Cantonese was considered, but not seriously.

The finally decided upon national standard was an artificial variety of Mandarin based upon a long-used lingua franca sometimes referred to as lánqīng guānhuà (藍青官話), lánqīng  here being a reference to the multitude of other dialectal influences on this speech.

The newly chosen form of Mandarin has a few key features, meant as concessions to speakers of the southern languages. It was meant to convey the total range of distinctions found in other non-Mandarin languages, including the entering tone as well as mid vowels /o/ and /e/ which are not present in some of the more northern dialects of Mandarin.

The language was decided upon, and the Guóyīn Zìdiǎn was published in 1919 as a record of this standard. But despite being well received and widely agreed upon, this standard was not without problems. Most significant of all was that there were no native speakers of this new variety, and thus no native teachers. As a result, teaching of the dialect was inconsistent.

In 1932, after a mildly problematic decade, the Guóyīn Zìdiǎn was revised without ceremony or much in the way of public announcements. A new version was published reflecting instead the educated speech of Beijing, thus ending the short run of the new national language.

Simplified,Traditional, 3rd Party candidates

It’s interesting to look back at what I cared about before and how those views continue to change. It’s also interesting to see how much people differ in what they value within the same general group of ideas.

In Shanghai the other day I saw another one of the previously ubiquitous billboards found in subways and airport terminals making note of how the simplified character 爱(愛) lost its 心, and how can you have love without heart, et cetera et cetera.

In the last year in Taiwan I’ve come to have a much more 随便 attitude toward character sets, and to characters in general. From the very start I’ve liked the moment of discovering a variant I’d never seen before, and I still like seeing different interpretations. A recent favourite is the half-traditional, half-simplified hand-written characters in YR Chao’s “A Grammar of Spoken Chinese”, or the variation used therein for 国. These have all been replaced with a Ming-Song typeface in more recent publications, but you can still find hardcover copies of the book with the hand-written glyphs.

Scroll down to the 4th scanned page on this post to see examples of both the hand-written forms and the variation on 国. My own copy being newer, I don’t have the enjoyment of enjoying Chao’s own writing habits.

There’s the argument that traditional characters preserve the culture and simplified are just one more instance of governments being bothersome. There’s the argument that simplified help in literacy and tradition has other outlets anyway. There are dozens of arguments in between.

I think part of my own view is tempered by having had to operate in both environments that use simplified (anywhere in the PRC, some textbooks in Taiwan) as well as those that use traditional (Taiwan, China, grad school wherever). So from that I say: If you’re serious about the language, just learn both; It’s really not that bad.

But I think the other part of my view is really coming from all this time working on Phonemica and the countless times we’ve spoken to journalists as well as volunteers. “Are you guys trying to protect (保护) the dialects?”, the question goes. “No,” we say, “not 保护. We’re not trying to stop the flow of Mandarinisation. We’re preserving (保存) them and the stories of their speakers”. Because even if we wanted to stop Mandarin (we don’t. we rely on it as lingua franca for our daily lives just like everyone else), we can’t. You can prevent languages from changing, from dying out, from splitting into other languages, from blurring borders between neighbours.

So yeah, I think if you’re serious about Chinese, you should learn both character sets. But then in addition to that, I think people would do well to understand that language change is a natural part of societies and that variant characters, hand-written short forms and all the other things that bother traditionalists are all going to happen anyway. Simplified characters are no less “real Chinese” than the modern metropolises are “real China”.

卍 in Personal Names

The above picture shows part of an article from the Taipei Times. The full article is available online here.

The relevant part:

Taiwan Association for Victims of Occupational Injuries representative Ho Kuang-wan (賀光卍) expressed skepticism over its effectiveness as a deterrent.

Anyone who knows me know’s I’m a fan of obscure characters. I get downright giddy when I see a variation on a character that I’ve never seen before. Equally cool is a character used in a name that isn’t usually. In fact, this may be the first time I’ve seen 卍 in the wild. I brought it up to Steve, and he can’t recall seeing it like this either.

Early on, I’d always thought 万, the simplified form of 萬, looked suspiciously like 卐 (also written 卍), which shares the same pronunciation. They are in fact variants. A quick look at brings up a seal script form matching 卍 (L22753).

From the Wikipedia article:

The paired swastika symbols are included, at least since the Liao Dynasty (AD907–1125) , as part of the Chinese writing system (卍 and 卐) and are variant characters for 萬 or 万 (wàn in Mandarin, man in Korean, Cantonese and Japanese, vạn in Vietnamese) meaning “all” or “eternity” (lit. myriad). The swastika marks the beginning of many Buddhist scriptures. In East Asian countries, the left-facing character is often used as symbol for Buddhism and marks the site of a Buddhist temple on maps.

That’s about it. Characters are fun.

Thanks to Anne for sending me this photo.

"China's tower of babel" and the language / dialect question. Again.

China Realtime Report put up a good piece about Phonemica. I thought the title not bad: “Getting China’s Tower of Babel on Record“.

A lot of you might see the article anyway, but I doubt you’d make it over to the comments, and the very first comment is probably one that Sinoglot readers and writers alike have spent way too much time thinking about:

What is the distinction between a language and a dialect?

Since I thought Kellen’s response gave a pretty nice simple summary, and since I know he’d be too shy to repost it himself, I thought I’d give it its very own Sinoglot post:

The real answer is that there is no answer. The distinction is arbitrary and can be motivated by a number of different factors; It can be political, historical, sociological, or just based on convenience. For example High German, Low German and Dutch form a continuum where a speaker from one end can’t understand a speaker from the other end if each is speaking their own hometown dialects, but speakers from any two neighbouring towns will have little trouble in communicating. China is made up of a number of such continuums, Mandarin being one, Cantonese another, Wu a third. For the project we treat Cantonese as a language and Mandarin as another language, but with a distant common ancestor, the same as Italian and Spanish are related through Latin. This is the reason we tend to group the entirety of our focus in the project under “Sinitic”, referring to any modern language variety that is descended from Old Chinese. The shared relationship of these language varieties is known, and the appropriateness of different degrees of fineness in distinctions between them is different for different situations.


Win Sinoglot's first-ever romanization prize!

What are we going to call it: the Sinoglot Sinitic Specialist Award?

The executive committee is still debating the exact value of the prize, but the contest is simple enough:

  1. Consider the following two lines of romanization* from a recording of a someone speaking a Sinitic language
    • A,aqna.Geqkeq cio Cinsaenonminro,Geqthe raeyeugnin rokeq va?
    • Naha la?Zyyau mentie ha la?Menmenkoe ya,Geqlaonshian raeyeugnin rokeq va?Dakae ze mmeqleq.
  2. On your honor, without peeking at the recording on Phonemica, be the first to name
    1. The Sinitic language being transcribed
    2. The type of romanization being used and a bit of its history

*Not guaranteed to be error-free, as Phonemica is crowd-sourced and editable by anyone!