Anyone got any idea how to do this?
I’ve got a normal XP PC, English version but with CJK language pack and whatnot, and a text file in Chinese which opens in MS Word and wordpad just fine (Word prompts me to confirm the encoding).
The file (compressed) is at mcu.edu.tw/~ssmith/cna101.rar It’s a bit big I’m afraid, but I don’t know how to cut out a sample without potentially changing the formatting
But when I use cat or anything else in cygwin, I just get garbage. Stuff that looks like Chinese, but isn’t real characters.
I read somewhere that you should try cat -U. But that doesn’t work in cygwin, no such command.
I tried downloading a utf-8 test file cl.cam.ac.uk/~mgk25/ucs/exam … 8-demo.txt . In Word and Notepad, this comes out a bit funny, maybe because I haven’t got all the language fonts installed, but no big deal.
With cat/cygwin, STILL the funny garbage characters that look a lot like “fake” Chinese characters (even though the demo file doesn’t contain any Chinese).
If I open the demo file in Word, as BIG5 format, I can reproduce the behaviour of cat (except that cat has a capital C cedilla that doesn’t show up in Word). Ah-ha, I thought, my version of cygwin thinks it’s BIG5 for some reason!
But no, when I look at the demo file (no Chinese in it) with Word, and choose either Windows or MS-DOS when the encoding prompt comes up, I still get the fake Chinese characters!
So I don’t really know what’s going on! Anyone know how I can read this file and process it with cat, grep etc?