Cibo.cn is the shit.

I was looking for some obscure vocab in Mandarin a couple hours ago, doing my best to manipulate Baidu searches to produce the desired results. What I eventually found was a nice little dictionary site, cibo.cn (词博网). It’s just one more site to add to the arsenal when it comes to quick-and-dirty translations. I put in 酿酒 (brewing, winemaking) as a test. Here’s the beginning of the output:

扩 Saccharomyces cerevisiae rasse sake 清酒酿酒酵母 【主科技词汇 】
扩 brewing machinery 酿酒机 【航海航天词汇 】
扩 brewing machine 酿酒机 【进出口词汇 】
扩 Saccharomyces cerevisiae 酿酒酵母 【主科技词汇 】
扩 malt brewing 麦芽酿酒 【主科技词汇 】
扩 wet brewer’s grains 湿酿酒精 【主科技词汇 】
扩 saccharomyces cerevisiae 酿酒酵母 【生物词汇 】
近 brew v.酿造(啤酒); 泡(茶); 酝酿 n.酿造物 【大学词汇 】
近 Boiled Glutinous Rice Balls in Fermented Glutinous Rice 酒酿圆子 【菜单词汇 】
近 fermented glutinous rice 糯米酒酿 【航海航天词汇 】
近 Vintages Port 佳酿钵酒 【进出口词汇 】
近 new cider 新酿苹果酒 【航海航天词汇 】
近 aging wine 陈酿葡萄酒 【航海航天词汇 】
近 brewer 啤酒酿造者 【主科技词汇 】
近 Glutinous Rice Dumplings in Fermented Rice Wine 桂花酒酿圆子 【菜单词汇 】
近 brewing barley 啤酒酿造用大麦 【进出口词汇 】

Not sure if anyone here has used this widely. So far I’m having fun just playing with the results.

Example sentences are provided as well. A couple from the same search:

句 What vintage is this wine?
这种酒是哪一年酿造的?

句 She has many occupations including gardening and wine-making.
她有许多消遣, 包括园艺和酿酒.

Good times.

12 responses to “Cibo.cn is the shit.”

  1. André says:

    Amazing site, thanks for the tip.

    Seems like a mash up of iciba and jukuu with both detailed vocab and example sentences.

  2. pc says:

    This is beautiful. My first objective was to get a good English-Chinese dictionary for computer science but it seems I don’t have to anymore!
    Thanks!

  3. Mark says:

    Any ideas what hanzi encoding they are using in the URL:

    http://www.cibo.cn/search.php?dictkeyword=%B5%C2%B9%FA

    It seems like it’s not URI encoding…

  4. pc says:

    That’s standard URI encoding (e.g. look at PHP’s URLEncode function). It means they’re using UTF8 (or some variant of Unicode) in their queries to the server, which is nice to know. These guys must be new and/or Western CS educated (seeing as all the older mainland/Taiwan sites are all GB or Big5 still).

  5. Mark says:

    When I apply URI encoding to the UTF8 string 德国, I get:

    %E5%BE%B7%E5%9B%BD

    But the website seems to have it as:

    %B5%C2%B9%FA

    Which is correct!?

  6. pc says:

    Aaaaaand this is why I shouldn’t make comments about things when I’m distracted (or rather, too excited to see Unicode around China). Yeah, so that’s the GB2312 encoding, which makes much more sense since every computer in China can support it (whereas older ones may not have a Unicode font installed). Sorry about the confusion!

    On that note, it looks like characters that are outside of GB2312 (e.g. 孨) don’t yield any results, whereas the 在线新华字典 returns a result.

    Regardless of their coding and my gaffs, still a good site! 😀

  7. pc says:

    Also, as a quick follow up, if you have a Mac, you can see all the other encodings in Character Viewer, or if you’re feeling craaaazy you can actually type in the representation into the calculator app and it will show you the resulting unicode character.

  8. Kellen says:

    Possibly difference could also occur if one is UTF8 and one is UTF16. I use 16 for most everything I do. Not that this one is UTF16.

  9. André says:

    Maybe sinoglot should do a post on different kinds of encoding?

    I would love to know what you guys are actually talking about 😉

  10. Kellen says:

    André: I’ll see what I can put together. In the mean time:

    Encoding is just the system that is used for websites or documents to tell the computer which character to display. Unicode (UTF) is probably the widely used modern standard, and in Unicode there is an 8-bit version (more common) and a 16-bit version (which required more data for each character/letter but can display a larger number of possible characters. GB is an older encoding system for Chinese characters only, while Unicode ideally should cover everything (though in practice this doesn’t quite happen, mostly just because there are too many scripts and not enough human time to put into making sure everything is perfectly done for each language).

  11. Mark says:

    Thanks for all the replies guys. I’m doing this is an Android app and, to get it working, I just changed this:

    android.net.Uri.encode(hanzi)

    to:

    java.net.URLEncoder.encode(hanzi, “GB2312”)

    and now it works :)

  12. Peter Nelson says:

    Kellen:

    UTF-8 is actually a variable byte encoding scheme, and *all* unicode characters can be encoded in it. The clever thing is that the more common the unicode character (in the sense of being lower in the table), the fewer bytes it takes up. So, for all ascii characters (the lowest 128 characters), it’s just one byte, making it backwards compatible with ascii. The least common ones can take as many as 4 bytes. UTF-16 is fixed-byte, in that each character takes up 16 bits (=2 bytes), regardless of its position in the table. Also–fun fact–UTF-8 was invented by Ken Thompson, who is known for his work on a lot of things (UNIX and C being kind of important). Now you know.

Leave a Reply