I don’t know if people already know about this, but anyway, on the CHILDES Web site of Carnegie Mellon University, there’s a zipped Excel file (or a zipped folder with an Excel file) of a Chinese word list that the site says was contributed by Academia Sinica. I think it’s about 20,000 words. Each word is in the form of a Traditional Chinese character expression and a pinyin expression with a tone number. Since it’s an Excel file, the words are numbered, but each word also has a number representing the number of instances in which it was found (presumably in Academia Sinica’s corpus, although I don’t know that for a fact).
At work, I had some problems unzipping the folder (my Chinese is so poor that I couldn’t understand the error message, if that’s what it was), but I was able to open the Excel file.
A word of warning if the list is from Academia Sinica’s Corpus: Much of Academia Sinica’s corpus is (probably like most corpora) heavily stocked with newspapers, etc., so I’m guessing it has a different frequency count from what you would get from a body of transcribed spontaneous speech. Here’s a description of the corpus:
Press reportage: 56.25%, Press review: 10.01%, Advert: 0.59%, Letter: 1.29%, Fiction: 10.12%, Essay: 8.48%, Biography and diary: 0.50%, Poetry: 0.29%, Quotes: 0.03%, Manual: 2.03%, Play script: 0.05%, Public speech: 8.19%, Conversation: 1.34%, Meeting minutes: 0.11%
Narrative texts: 70.66%, Argumentative texts: 12.24%, Expository texts: 14.72%, Descriptive texts: 2.83%
Written: 90.14%, Written-to-be-read: 1.38%, Written-to-be-spoken: 0.82%, Spoken: 7.29%, Spoken-to-be read: 0.35%
Philosophy: 8.68%, Natural science: 12.97%, Social science: 34.99%, Arts: 9.28%, General/leisure: 17.89%, Literature: 16.20%
Newspaper: 31.28%, General magazine: 29.18%, Academic journal: 0.70%, Textbook: 4.08%, Reference book: 0.13%, Thesis: 1.36%, General book: 8.45%, Audio/video medium: 22.83%, Conversation/interview: 1.63%, Public speech: 0.25%[/quote]
Anyway, if you’re interested and can get the file open, enjoy!