Scripts and banned words
A bit late to the party on this one, but a few days ago Danwei had a great translation from Hecaitou’s blog on the futility of blocking dirty words. Creative stuff:
Hecaitou originally wrote: 不 矢口 亻十 么 日寸 候 , 亻奄 口斤 言兑 矢豆 亻言 也有 辶寸 氵虑 敏 感 字 节 白勺 言兑 氵去 , 于 是 , 亻奄 学 会 了 扌斥 字 ……后 来, 亻奄 米青 礻申 分 歹刂 鸟~”
Danwei translation: “I don’t know when it was that I heard that mobile phones are also being filtered for sensitive words, therefore, I learnt to split characters… later on, I became schizophrenic”
For those still wondering what’s going on, Hecaitou takes characters that can be broken into parts which are also characters in their own right — and he simply breaks them up. The result is visually clear but hard for an unsophisticated character/phrase-blocking program to understand. Compare
Original: 不 矢口 亻十 么 日寸 候 (9 characters, meaningless gibberish)
Read as: 不知什么时候 (6 characters)
Gotta say, a lot more fun than #### for naughty words on voicemail transcripts from Google, which was also news last week (NB: my brother tried to reproduce their results in a voicemail to me but only succeeding in getting it to write “box”).
But of course, this sort of script-play isn’t the exclusive domain of hanzi. Try searching this page for Τіbеt and Хіnјіаng and see what you come up with. Nothing? That’s right. If you see it, but you can’t search for it, is it there?
If you like the effect, you too can have it. Check out Kellen Parker’s “sensitive word masking for blogs in China” tool.
I’ve heard this called Martian on a lot of blogs. In that case it tends to be an alternateen sort of thing. The other common method is to use characters that are homophonous but with quite different characters.
I actually did this on my personal blog for my name since I didn’t really want to be so easily connected to the name in search engines.
I’m not sure if I buy this part… Isn’t it just another string of characters (the same as any new word added)? Not only that, but I think that with the character composition databases out there, the 拆字后 characters could even be automatically generated from a list of sensitive words.
You can’t deny the creativity, though!
Sure, just another string of characters, but… well, I guess it depends on your definition of “sophisticated”
[…] but there’s an even more creative method for one-upping internet senser ship software: pulling characters apart into pieces. ~ Discuss (0) […]
Can you PLEASE change your website’s font? I’ve increased the zoom on my browser but reading italicized Chinese character is no fun any way you cut it. The site looks great aside from that. I’m going to enjoy reading it.
Alex:
Italics have gone. I’ll up the base font size a point or two as well.
Schizophrenic? or Schizophonic?
@carmen: COL (chortling out loud). I’m going to borrow schizophonic some day