Academia Sinica Chinese Word Frequency List

To 10,000 headwords! Booyah, muthaluvas!

Very interesting. Thank you.
Are there any associated publications or infos about the research?

[quote]111 由於, 由于 (yóuyū) - conj.: owing/due/thanks to
112 時候, 时候 (shíhòu) - day,moment,(a point in) time,(duration of) time,time[/quote]

Yes, my Chinese is pretty poor.
But interesting that I use 112 an awful lot, but I don’t understand 111 at all.

There are lots of duplicates in that list. What’s with those?

375 建立, 建立 (jiànlì) - establish,set up,found
376 建立, 建立 (jiànlì) - establish,set up,found

363 開發, 开发 (kāifā) - develop,open up,exploit
364 開發, 开发 (kāifā) - develop,open up,exploit

Hmmm… Based on some of the other bracketed notation, it looks like they may have accounted for lexical classes. That will probably lead to duplicates being pretty distant from each other as one goes further along the list. Maybe I’ll plug it into a spreadsheet to see if that’s happening.

For the purpose of studying Chinese, you also have to consider whether the words are primarily used in spoken language, in written language, or in both. To figure that out, usually you need to look at the sources of the data. Where did they get the sentences they’re breaking down to get this list in the first place?

Frequency isn’t an absolute indicator of what you need to learn, either. There are all sorts of factors that would skew the list of what you personally need for your life using Chinese (work, special interests, location, etc.) So this sort of list is only a reference and shouldn’t be taken as a “I’ll memorize this and then my work is done” sort of thing.

[quote=“ironlady”]For the purpose of studying Chinese, you also have to consider whether the words are primarily used in spoken language, in written language, or in both. To figure that out, usually you need to look at the sources of the data. Where did they get the sentences they’re breaking down to get this list in the first place?

Frequency isn’t an absolute indicator of what you need to learn, either. There are all sorts of factors that would skew the list of what you personally need for your life using Chinese (work, special interests, location, etc.) So this sort of list is only a reference and shouldn’t be taken as a “I’ll memorize this and then my work is done” sort of thing.[/quote]

Agreed.

But I think this list could be extremely useful with people who have a wide-range of needs in terms of uses (literary, academic, slang, conversational). Perhaps spending a ton of time on this, after the first 2000 or so might not be the best for a beginner, but I think for those with the patience and are at an intermediate level, going through and getting the words they do not readily know could be a helpful structure.

Certainly – if they have enough knowledge to know which words belong in which kind of usage. That’s usually the problem with using any non-textbook source of language in Chinese – it’s just hard to tell if you don’t know in the first place, which is sort of contradictory.

Assuming a standard Zipfian distribution at a 7% frequency for the most frequent term (the last frequency list that I saw placed ‘的’ first at 9%) of a 1,000,000-word corpus, you’ll need 710 terms under your belt to cover 50% of the corpus, and will cover 68.5% of the corpus by the 10,000th term. You hit a “dead crawl” (where learning one new term makes absolutely no greater gain than learning any other less frequent term in the corpus) by the 46,667th term, or thereabouts. That’s a lexicon greater than most humans have in any given language, mind you (I’m betting that none of us here have 50,000-word lexicons in any language.).

Individually worded corpora are fun and useful in moderation, but not complete vocabulary guides because of sense ambiguities increased by compounding. Kitties and litter are not kitty litter, are not litters of kitties. I know of no frequency corpora that disambiguate by word sense, or even of the theoretical possibility of one doing so.

This is not to mention the other 美中不足的 – no Chengyu!

Also interesting at 2790: 師大, 师大 (shīdà) - normal university,National Taiwan Normal University

But if the question most pertinent to readers is, “What words should someone learn first?” I personally like this answer most.

I would wager that it came from here.

Thank you for posting this.

Here are some lovely translations

中國, 中国 (zhōngguó) - Cathay,China,china
美國, 美国 (měiguó) - House_of_Representatives,the United States of America,US,U.S.A.,bench,Columbia,Yankeeland,U.S.,United States,Democrat,Yankeedom,USA,America,United States of America

Ahh yes, this will help me better understand those articles on Cathay-Yankeeland relations

Where is the list?

Sent from my MI 2S using Tapatalk