Chinese Character Numbering System


#1

I recall that in the 1970’s and 1980’s I saw some telegrams between some Chinese people in Southeast Asia and the USA, and they were all in numbers, such as

AB09 CX14 DR11 CC12 AA43 JU18 CX29 FE07

I was told that there was a code-book of Chinese characters, and that the sender (of the intended telegram transmission) had to laboriously transcibe their message into this code, so that it could be transmitted by telegram. Then the receiving party would decode it at their end into a readable Chinese message.

I am wondering if any such system is still in existence today? If so, it seems to me that with the help of the appropriate computer program, the work of transcribing a Chinese character message into code numbers, and then decoding it, would be greatly simplified. (Of course I realize that in the world today we have FAX machines, and scanners, which make the task of transferring Chinese character data over long distances considerably easier.)

Nevertheless, I bring this entire subject up because I still find that I am frequently perplexed about how to communicate an exact Chinese character, by email or in English language text (such as on an English language website), to persons who have the desire to know what that exact Chinese character is. For example, Chinese people living overseas in many cases do not have Chinese systems on their computers. Some foreigners have pretty good Chinese skills, but don’t have Chinese system on their current computer setup. And on the other side of the coin, for those foreigners without any Chinese skills, who wanted to communicate a small amount of Chinese information, by referencing a comprehensive code system, they could still effectively communicate the exact Chinese characters (used in an address for example) to other people.

So, say, a person a Kaohsiung could write to me and say “Please send the catalog to me as soon as possible. My address is 2nd Fl., No. 115 Hsin Ya Er Lu, Sec. 2, Kaohsiung City, Taiwan 801. The four Chinese characters in that address are AC45 DF67 GH14 EE11.”

Hence, I feel that such a coding system could very much still be of use in the world today. However, in order to use such a system, it would also appear to require that a large organization (perhaps a government agency) place all the Chinese characters (over 50,000 at last count, correct?) on-line somewhere for easy reference, and hopefully in not only Big-5 and simplified versons, but in GIF format as well, for those who do not have Chinese system on their computer.

Without such a coding system, how can we effectively communicate these small groups of Chinese characters inside roman alphabet correspondence?

Can anyone offer some insight into the issues which I have raised here? I am stumped.


#2

Unicode provides “a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.” It encompasses traditional as well as simplified Chinese scripts – as well as Cyrillic, Armenian, Hebrew, Arabic, Devanagari, Bopomofo, Bengali, and pretty much everything else under the sun.

5COF, for example, is the character xiao3 (small). Unicode uses only alphanumeric codes to identify characters, so there’s no problem transmitting the code numbers in an e-mail message.

My site has an example of a Chinese and English Web page that uses Unicode. You may or may not be able to view this correctly, depending on the version of the browser you’re using and the fonts on your system.

The problem, of course, is in having the sender figure out the Unicode number.

Systems are moving more and more toward Unicode compatibility and Unicode fonts are found more and more often. Soon, there won’t be an excuse for people not to be able to view and even type Chinese characters on their computers. Actually, both of these are fairly easy already.

But the main problem with communicating Taiwan’s addresses without characters is, of course, the stupid, stupid, inconsistent, sloppy application of romanization. Did I remember to mention stupid?


How to spell it? www.Romanization.com


#3

Yes I definitely agree with Cranky, the sooner we all move to Unicode the better. Unicode would be the most suitable “code” for distinguishing one character from another regardless of whether it was simplified or the long form or it was a character in another language altogether.

On a different vien, I want to point out that the Hong Kong Govt is using another quite unique Chinese character numbering or encoding system of its own which is not Unicode,GB, Big5 or any of the encoding system commonly associated with Chinese computing.

It appears on the current HK ID card and under the chinese characters or the Chinese name of the holder.

For example if my name is Chan Shui Bian, it will probably, 3785 4782 4987, all numeric digits.

Now even if you could figure out what the numeric digits equivalent is for each Chinese character, under this all numeric encoding system, for your name, you would still not be able to guess the check sum digit used to establish “validity” of the ID card.

The check sum digit is a number from 0 - 9. It is used in a mathematical calculation with the rest of the numbers in you name to arrive at a valid value.

It is believed that a check sum digit is used with these unique numbers of the person’s Chinese name possibly also related to the person date of birth etc to ensure that this ID card is a valid ID card and you are a valid person and not a cloned ID!

In the old days it was assumed that there were people who manufactured HK ID cards for people who want to illegally emigrate from say China to Hong Kong. I suppose the check sum digit and the all numeric chinese character digits allows a further level of checking which the HK police could used to check whether this ID card was valid.

Make sense?

One day when it is possible to clone human beings, they will have to have a series of numbers or bar codes with check sum digits to check which clone or version this is?


#4

If you wanted to create a numbering system for the characters, it seems that the logical way to go would be to use the radicals. Pick an ‘authoritive’ dictionary and use their radical classification. Every radical has a number, so just combine the numbers to list the radicals that make up the chracter. Only prob is that there’s 214 radicals, so you’d need maybe 9 or 12 digits.

On the other hand, the time it would take to look up a chracter (at both ends) would quite possibly take longer than just describing it. Like I live on Nan Chang (pinyin) Rd. That’s Nan - South and Chang is like sing, without the mouth. Only prob with this is that only someone who knows Chinese is going to understand this I guess.

I guess another way would be to use an online graphics based distionary and paste the web addresses with your characters. Time consuming for the writer, but quick and fullproof for the reader.

Bri


#5
quote[quote]I guess another way would be to use an online graphics based dictionary and paste the web addresses with your characters. [/quote]

Could someone give me the URL addresses of some graphics based Chinese online dictionaries?


#6

Hi Richard,

I’m sure there are plenty of them, but one online dictionary that uses .gif graphics files is

http://www.zhongwen.com

Scott


#7

Zhongwen.com is a great dictionary, but unfortunately it’s not good for referring someone to a character. This is becasue it doesn’t have a seperate address for each character, so you can’t say ‘xin’ - www.jfaosifho.com or whatever. There are links to other dictionaries from zhongwen.com though. No time to search now, but I’ll try later.

Bri


#8

Try this http://www.chinalanguage.com/CCDICT/index.html
There are several input methods, so you can search for the character you want using Pinyin, English or whatever. When you get your list, click on the character that you want and you’ll see a unique page for it that you could use as a reference.

So, for example, I could say that I live on Nan2 Chang1 Rd = 南昌

Nan =
Click here

Chang =
Click here

Takes up a lot of space, but it should cover all bases.

Only thing is, I don’t know how I can be 100% sure that this dictionary is using graphics.

Bri


#9

Previous postings to this tread suggested using Unicode numbering system, however when I logged onto the Unicode site, it appeared that it would be necessary to download a 5MB file in order to get all that numbering data for all the characters. I actually tried to download that file for about a half hour with my ADSL connection, but couldn’t get it. At any rate, I don’t think I can automatically assume that EVERYONE has this data on their computers, which would be necessary to make such a system workable. Or am I missing something???

Bri’s suggestion, as above, is good, but I am still a bit confused as to how to get the character.

Let me try again to use an analogy to clarify the problem: Suppose we have three persons: Person-A, Person-B, and Person-C. Person-A wants to transmit Chinese data to Person-C, but has to communicate through Person-B. While Person-A and Person-C have good Chinese language skills, unfortunately Person-B does not, in fact Person-B cannot even use a Chinese dictionary. So, how can the communication be accomplished?

What I have suggested above is that there needs to be a standardized and easily accessible coding system for each Chinese character. Thus when this code number is given to the intermediary (Person-B) he can communicate it easily to Person-C, who can then use his own reference book to determine the exact Chinese character.

This analogy is appropriate because if we consider purely English language computer systems, communicating by computer is the intermediary step (represented by “B”). We have a person at one end (represented by “A”), and another person at the other end (represented by "C), who want to communicate a small amount of Chinese character data in a very precise way.

Hence, while I appreciate all the suggestions and insights that everyone has offered, I still don’t think we have arrived at a solution, have we???


#10

Richard

Look, I do not believe a system where a person refers to a lists of codes (albeit Unicode) from the “Internet” as suggested by Bu is really workable or practical for now. All the indexes on the Internet I have seen are not built for this purpose. ie character to Unicode. ( But you can build one see below)

When one is writing a message, it needs to be done fast and the look up must be just as fast as well …like looking things up from a dictionary.

The answer is you have to spend some money initially: Either buy the book with Unicode Index or buy some software “dictionary”.

NJStar has a word processor (NJStarWP v4.33) which has a tool/function called “Hazi Infomation” which displays all the different codes (including the relevant Unicode) of the Chinese character under the cursor.

Also its brother the NJStar Communicator v2.23, comes with a free email software such that it can turn any email page (of text Chinese/Eng) into “.gif” format such that the receiver can view your page in his/her Internet browser.

NJStar (WP and Communicator) are shareware so theoretically you do no need to pay to use them. They are also on special offer for USD$99 dollars (for both).

Back to your problem:
You can ask A and C to install a copy of NJStarWP each - either paid or free trial copies.

Mr A types up the Chinese and have it automatically saved in HTML(Big5) format; or translates each character individually using the “Hanzi Info” function under “Tools” into Unicodes.

Then A passes the HTML page or the Unicodes numbers alone to B, and then B to C.

C can then use the “Unicode Input Method” which comes with the NJStar WP list of 20 input methods, on his or her own NJStarWP, to input/type the Unicodes that he has just received from B. The corresponding Chinese character for each Uni-Code should then appear in his WP.

I think the nice thing about NJStar as opposed to one of the other “Chinese enablers” ( such as TwinBridge, RichWin, etc etc ) is that it is relatively “light”, easy to install, and use for the English speaker. All instructions are in English and the menus are in English as well.

The NJStar Communicator for instance is so easy to use you can almost figure things out without even referring to help. The other nice thing about shareware of course is that you can use the software until you feel the software is “worth it” before paying.

My personal impression is that USD99 for both softwares which comes with a dictionary of over 50,000 words and INSTANT Hanzi info for at least 13000 Big5 characters plus 7000 GB(simplified) characters is well worth it.

And later you can also commission someone like Cranky to build your own personalized “Hartzell index” of “most frequently used Chinese characters” with the relevant UNICODE information for each. It will be abit like morse code with around 3000 mfcharacters. ( Or I can ) This database can then be place at a convenient location on the Internet eg at www.hartzell-index etc so that anyone can make a quick lookup of the Chinese to Unicode or vice versa when needed, while writing emails from anywhere you can access the Internet, without having each and every PC you use installed with NJStarWP (Hanzi Info function).

 [img]images/smiles/icon_smile.gif[/img]

#11

Actually this is really weird. I have no idea how I put those chracters there. I just copied the web addresses and it somehow automatically converted into a chracter. Did a moderator somehow change it, and if so, how? This is strange.

That is to say I typed:

http://w ww.chinalanguage.com/cgi-bin/view.cgi?table=ccdict&codepoint=5357&mode=internal&lang=
en&beijing=pinyin&canton=jyutpin&hakka=default&sound=0&fields=mandarin,english

without the space I’ve deliberately inserted after the first w in www.

Without inserting the space it looks like this:

htt p://www.chinalanguage.com/cgi-bin/view. … in,english

But the first time I did it it got changed into a chracter, and now it just looks like a web address. Hmmm?

Anyway, I think it must be able to work for what Richard wants. A looks up the character, writes the web address. B copies the web address and gives it to C. C looks it up on the net again. A and C need the internet, but B doesn’t even need that.

Here’s the address of that dictionary again:
http://www.chinalanguage.com/CCDICT/index.html

Bri


#12

I appreciate Bri’s suggestion, but I think it is too complicated. (Sorry!) Such a lengthy web-address is easily confused.

I just need a simple URL and a number for each character, such as CB08 DY16 ES12, which could (for example) represent the characters in my Chinese name.

Or FH15 HU14, which could (for example) represent the “Hsing” and “Yun” in my address.


#13

Hi Bri,

I edited your post to include the character image and link. Sorry I didn’t mention it to you earlier.

I edited it because I don’t like the way the margins bust out to the right. No biggie.

Sometimes, I also correct spelling or grammar errors – but this is rare since I don’t have much time normally. What can I say, I work as an editor for a living!


#14

OK Richard, I think I’ve got it. Using the same website you can type in a four digit unicode number and get the chracter.

Example:

How to write the two charcters from your address. Go to the dictionary main page:

http://www.chinalanguage.com/CCDICT/index.html

and go to the last input option ‘internal code input’

http://www.chinalanguage.com/cgi-bin/view.cgi?table=ccdict&mode=internal&sound=0&beijing=pinyin&canton=jyutpin&hakka=default&fields=radical,index,mandarin,english&lang=en&show=0

Input 661F and then search for the first charcter and 96F2 for the second character and search.

Of course once someone knows the system, you jsut need to tell them the two four-digit codes, and you can write down the ones you commonly use etc.

From your end what you need to do is search using English, pinyin, radical etc for the character and write down the for digits immediately to the right of the charcter after ‘U+’ (which stands for unicode I think).

I’m pretty sure this is convenient and foolproof. I’d be surprised if you cna get any better than this.

Bri