Chinese character display question


Likely a very simple answer for someone. I receive email from time to time which include Chinese characters. Sometimes they display just fine. Other times they display like this:

The difference is whether the message contains a correct Content-type: header or not. For example:

Content-type: text/plain; charset=iso-8859-1

This says that the message is in the character set iso-8859-1, also known as Latin-1 or “western”. The usual charset used for Traditional Chinese is Big5. The problem is that a lot of software does not add in a Content-type header at all, or adds an incorrect one. The latter is a common problem if you usually use English but occasionally want to send something in Chinese. Technically a message without a Content-type header SHOULD be interpreted as being ASCII only. But because there is so much broken software around, most mailers will default to treating messages as defaulting to a particular charset, and usually pick the default default as whatever charset you normally use.

So for most English speakers and other western europeans, that’s usually iso-8859-1, and for most Tradtional Chinese users, it is Big5. This all works if you stick to one language. Someone sending out a message in Big5 to other Big5 users will have it display in Chinese because all the other Traditional Chinese users will default to Big5 encoding. This has the side effect that they will have no idea their software is broken and assume that it is your problem.

Most mail readers have several settings for default character sets. You could change your default to Big5 and still read most english emails correctly, but you would lose any accented characters from other western european languages. Also some products use funny versions of the quote characters that would also cause difficulty. Some mail programs like Mozilla allow you to set a per-folder default, so you could sort mail from dodgy sources to that folder and be able to read it properly. Or just change the setting each time you come across a broken message (View/Character Encoding on Mozilla based readers).

yes it just depends on the author of the website
if the head part is set up correct everything will be displayed

Except that e-mails aren’t websites.

In the case of websites, the Content-type header is also used, but it is a header in the HTTP transaction, not in the file returned. If the file returned is an HTML page, you can use a Meta tag to set the Content-type header there as well.

The same brokeness happens there as happens in email, and the same kludge of defaulting to the user’s default charset exists if there is no Content-type header or Meta tag.

you can send emails in html, so there is no diffrence to a website.

Actually there is still a difference, in that even HTML e-mails have e-mail headers, and also anyone who deliberately sends HTML e-mails deserves to be hung by their taint.

words of wisdom

There some differences with HTML emails. HTML emails are usually sent as multipart MIME (the same thing that lets you do attachments). MIME lets you set a separate Content-type: for each part of the message. In addition, HTML can also set a Content-type as a META tag inside the HTML. Then to add to the brokenness mentioned earlier, a proper html segment is identified as text/html, BUT there’s a lot of broken stuff that forgets to set this. Mail programs should not treat something as HTML that isn’t identified as text/html, but for the most part mail programs will treat anything that looks like it has HTML in it as HTML. Then whether it will use the META tag or the MIME part Content-type if both exist is another random factor. If all this makes your head spin, it should. The way email actually works today is a big mess.

As to whether or not sending HTML messages is acceptable or not, I think this battle is something that should have died long ago. The real issues with HTML email now are whether the HTML includes web bugs or JavaScript. My other peeve are the idiots who send out messages in extra-tiny or extra-huge font sizes. Anyone sending that sort of crap, should be bit-bucketed. Except for a few technically inclined groups, most users prefer HTML email these days, and few mail readers still refuse to handle HTML.