Big 5 encoding

I’ve never had a problem with this before, but now, whenever I read Forumosa, my browser sets itself to Western European encoding, so I can’t read Chinese characters. It doesn’t make any difference if I set it back Big 5, (auto or not), next page is back to European.

I’m using XP with explorer, and after I set everything right, I’ve never had problems reading Chinese characters before. Also if I go to the Taipei Times site, I have no problem reading their characters. Is Forumosa somehow setting my browser to Eurpean, or is htis something else?

Brian

I have similar problem with my forum www.chinesesquabble.com/phpBB2/
I’m trying to fine a way to let user switch interface languages when they first enter the site, and I want to set all encoding to UTF8.

Otherwise, what you type in Big5 might not be viewable for others who doesn’t have Big5. Unicode (UTF8) is more prevalent encoding and it supports most languages. Once the default encoding is set to UTF8, we can merrily type in any languages on the forum, be it japanese, chinese, arabic, russian or hebrew, they will all stay intact and remain viewable.

ax

Well, I just experimented with this and other sites. It seems that the site somehow tells Explorer what to set the encoding to. Forumosa used to set me to Big5 or Unicode, now it sets me to Western European. Either that or there’s an override setting somewhere that I accidentally turned off.

Brian

yes bri, if it were with html you could click view /source
and near the top you’ll see this line:

what set behind the charset determine the default encoding it tells your IE to start with. As with forumosa, it uses php, I'm not sure which files I need to modify to get the fastest and best result.

ax

phpBB is configured to be able to use different languages and comes with English as the default. There’s several different language packs that can be downloaded. Unfortunately except when using UTF-8 encoding, these language packs can cause incompatibilities if people try to view the same content with different language packs. The language packs publicly available are usually in one of the popular native encodings, not in UTF-8. However, it is possible to convert a language pack to UTF-8, and most current browsers handle UTF-8 well. Anyway, to get to the point, to change the encoding for a language pack, go into the language directory of the phpBB install, then go into the language pack directory, e.g. lang_english. The lang_main.php file defines the charset for web pages for that language pack, e.g.:

$lang[‘ENCODING’] = ‘UTF-8’;

Both UTF-8 and BIG5 are supersets of ASCII, but not of ISO-8859-1 (aka Latin-1), so unless non-ascii characters are used then just changing the ENCODING setting is enough to convert english to a different charset. There’s also an email subdirectory in each language pack which contains templates for emails the web site sends out. The charset used is specified on the second line of each file.

For language packs that are not limited to ascii, one can use iconv to convert the language pack files to a different charset. For example, to convert lang_chinese_traditional_taiwan to UTF-8 you can use the following for each file in the language pack:

iconv -f BIG5 -t UTF-8 < oldfile > newfile

I’ve already done conversions to UTF-8 for Chinese Simplified, Chinese Traditional (Taiwan), English, French, German, Japanese, Korean and Spanish if anyone wants copies. The advantage with UTF-8 is that users on my phpBB can mix any languages together even on the same page and it will be readable no matter which language someone has set their profile to. Using native encodings would mean someone that is set to Japanese would not be able to read a Chinese posting without manually setting the charset for each post.

An example multilingual phpBB thread is here: forums.tcp.com/viewtopic.php?t=4

I would be happy to assist the admins here in any language settings or conversions if the above isn’t clear enough.

When I go back to some of my old posts which include Chinese characters, the encoding is often screwed up, as in this post. No matter what I set my encoding to now in IE, the characters I typed way back then are now all garbled. What gives? If I go back in and retype all those characters, is this just gonna happen again?

Some characters in that post are ok if I use unicode, the others are broken whatever I use. I have no idea why. Could you have copy and pasted some and typed others ? But if it looked ok then, it should be ok now :idunno:

Yeah, it’s not unlikely that I did part of the typing in Word, pasted that in, then edited it further in the posting window. Ugh.

EDIT – I went back to the post, opened the editing window, set the browser to Unicode, and both typed and pasted characters. After submitting this, I went back to view it, with the browser in Unicode, and both looked just fine. I repeated this, but with the browser set to Big 5, and again it looks just fine as long as I switch to Big 5 to view it. So I still have no explanation for why it would all be garbled a couple of years later.