CHILDES Web Site Contains Large Authentic DIY Kid Corpus

[Edit:] I finally learned the bare rudiments of part of the software, and it appears there is a very large body of children’s authentic transcribed spoken language on the CHILDES web site. There’s also an interface called CLAN that you can use to view or manipulate the language in the various files.

I’ve only read a little of the material, but one of the kid dialogues I read tonight would actually make a good dialogue for teaching kids. I’m not sure of the intellectual-property legalities of using the materials yet, but I think they are under a GNU Public License. I’ll look into it as I go along.

There’s a UK directory and a US directory. There are also directories for languages other than English, and incidentally, there’s a directory of aphasic speech (speech of people whose language faculty/ies have been impaired by stroke, head injury, etc.).

But anyway, if you’re looking for language and vocabulary for kids, maybe you should check this out.

Also, there’s a neat, short APA book review that gives basic information on CHILDES; the review is by Jean Berko Gleason (of “wugs” fame) and Bruce Thompson. It’s in pdf format, and it’s locatedhere.

[Earlier post:] I haven’t explored it thoroughly, but it looks as if a kid word-frequency list could be constructed using something called the CHILDES database. The CHILDES website is located here. It looks as if this website has all kinds of recorded or transcribed children’s speech, available free.

It would involve learning one’s way around the special software that’s used in that system, but it doesn’t look too hard to learn. Again, I haven’t looked into it adequately yet, but it looks as if a person could arrange a big mass of kids’ spoken words according to frequency using their software, which is free and downloadable from the same site. Then the most frequent words could be used to construct a vocabulary for teaching.

Some Japanese educators have already used CHILDES to help them construct a children’s vocabulary for teaching English in elementary schools. There’s information about that project here.

This vocabulary list is included in an appendix to an Asian EFL Journal article entitled “Creating a Corpus-Based Daily Life Vocabulary for TEYL.” The article is now on the Internet in PDF form. The article states that words from picture dictionaries were included in the vocabulary list, in addition to words from the CHILDES corpus referred to in the post directly above this one. Here’s a link to the PDF version of the article: asian-efl-journal.com/PTA/Volume-54-kc.pdf

The list is on pages 57 and 58 of the article (PDF pages 28 and 29). The article itself is worth reading, in my opinion.