Unified romanization for all Taiwanese native languages possible?

hansioux · August 20, 2014, 9:58am

[quote=“sofun”]It will make a bunch of people speaking Seediq, that’s true.

With youtube virtually for free, the language will take care of its own accent. The language is good.[/quote]

It’s odd that everytime people point out Kana is inherently incapable of accurately represent Taiwanese native languages, you divert the conversation as if we are saying Taiwanese native languages aren’t good. The language is never the problem, but since people design Romanization systems, we can judge whether a system is well designed for the task, and Kana simply doesn’t cut it. Your proposed system is flawed for the task of annotating phonology of Taiwan’s native languages and that has nothing to do with accents or whether or not Holo or Seediq language are good or not.

By the way, as stated in the title, this thread is about romanization, and if you insist on forcing your Kana system into the discussion and turn it into another round of pointless meandering, perhaps it’s better for you to do a thread on an Unified Kanazation for all Taiwanese native languages, where you can share with us how you propose to annotate /β/, /ʈ/, /ɖ/, /ɭ/,/ɾ/, and other Austronesian phonology with Kana.

sofun · August 20, 2014, 5:19pm

[quote=“hansioux”][quote=“sofun”]It will make a bunch of people speaking Seediq, that’s true.
With youtube virtually for free, the language will take care of its own accent. The language is good.[/quote]
as if we are saying Taiwanese native languages aren’t good.[/quote]

You got the wrong idea.

I’d never written any aboriginal languages before. Now that I wrote it and started speaking it, I feel great! With two systems that are visually different, each one cross-referencing each other, I am able to track and locate the sound, AND enhance fluency and memorization. This is done without changing the existing way Seediq romanization shown in the teaching materials.

I will open another thread for sure.

Here your task is to show us the development of a unified romanization.

hansioux · August 21, 2014, 5:54am

Just updated the chart with Amis phonology, basically 3 dialects of Amis has a total of IPA 23 consonants (Amis themselves would say there are only 18 to 19 vowels, but we are counting all 3 dialects), and only 5 vowel consonants (Amis themselves would say they only have 4 vowels, but /u/ and /o/ are used specifically for some grammar situations).

Amis added 8 new consonants and 0 new vowels to the Holo-Hakka-Mandarin phonology chart.

New consonants are:
/ʡ/
/d/
/z/
/ħ/
/ð/
/ɾ/
/r/
/ɬ/

So right now I have a total of 45 consonants and 16 vowels and 9 tones.

It all seems pretty manageable so far. I am not sure how diverse Austronesian phonology can get. Although, right now the hardest parts seems to be distinguishing flap or tap, trill and lateral frictive alveolars. Since the Amis language actually uses /d/, /z/, and /r/, it would also mean not using letters d, z, and r in the 3 Han languages, at least not as initial consonants. While that’s not an issue for Holo and Hakka, how to replace r for /ɻ/ and /ʐ/ in Mandarin would be a challenge.

Temporary proposal is to use rh to represent /ɻ/ and use zr to represent /ʐ/, although for the pingyin r sound, people either use /ɻ/ or /ʐ/… so perhaps we don’t have assign a romanization for both? That is unless some other Aboriginal language also uses /ɻ/ or /ʐ/.

Hokwongwei · August 21, 2014, 7:35am

I have to admit I have no idea what most of those phonemes sound like. Time to use the wikipedia IPA page.

hansioux · August 21, 2014, 8:30am

/ɻ/ is when a Taiwanese says 肉 without the alveolar sound (teeth don’t touch)

/ʐ/ I assume is how they would teach you to pronounce r when they teach you pingyin.

Hellstorm · August 21, 2014, 8:38am

[quote=“hansioux”][quote=“sofun”]Mgeela ini knkela, snpusan mtluhe!
むげえらいにぐんげらすんぶさんむづるへ
[/quote]

Again, kana is terrible at annotating Aboriginal languages as well. How would anyone reading that know how to pronounce Mgeela from むげえら(Muge era), or knkela from ぐんげら (Gun-ge-ra), or snpusan from すんぶさん(Sun bu-san), or mtluhe from むづるへ(Mudzuruhe)…

Heck, you even have げ for both ge and ke[/quote]

Well, why would Kana be worse than the latin alphabet? The latin alphabet is borrowed for any number of different languages (English, German, heck, even Vietnamese or Chinese!) with absolutely no connection to the original pronunciation (what is even the original pronunciation?). So why do you want to read the Japanese pronunciation when using Kana? You can just say that むげ = mge instead of muge. You don’t have to care about Japanese.

That said, of course using Kana is silly. But I guess using Latin alphabet is as equally silly. Why not go with Cyrillic? Or Arabic? Or why not create something new? And why don’t you want to use diacritics? Diacritics are quite nice, I think. I actually like them

hansioux · August 21, 2014, 8:47am

[quote=“Hellstorm”]
Well, why would Kana be worse than the latin alphabet? The latin alphabet is borrowed for any number of different languages (English, German, heck, even Vietnamese or Chinese!) with absolutely no connection to the original pronunciation (what is even the original pronunciation?). So why do you want to read the Japanese pronunciation when using Kana? You can just say that むげ = mge instead of muge. You don’t have to care about Japanese.

That said, of course using Kana is silly. But I guess using Latin alphabet is as equally silly. Why not go with Cyrillic? Or Arabic? Or why not create something new? And why don’t you want to use diacritics? Diacritics are quite nice, I think. I actually like them [/quote]

Because I see no proposal of how to separate む as just m or as mu, yeah, Seediq has both, see paper by Pei-jung Lee.

I am attempting to come up with a system where the letters corresponds to the IPA as closely as possible. I will make changes to Tailo where it deviates from IPA, as well as other languages. So far I’ve compiled the phonology of the 3 native Sinitic languages as well as Amis and Atayal, it still seems fairly possible to achieve my target. Except for Altayal where n and g are sometimes used together and there’s a need to distinguish that from the common ng for /ŋ/. The solution is probably simple as the altayal languages doesn’t really have g, it uses g to represent /ɣ/, so as long as I come up with something for /ɣ/, for example gh or gy, such issue would be avoided.

By the way I do often daydream about a Taiwanese script, I think about devising them from Aboriginal weaving patterns and scripts left on stone architectures, and compile them Hangul style when annotating Hanji. However, regardless of the feasibility of such script ever be adopted, there will still be a need for Romanization. I rather have a reasonable functioning Romanization for all native languages first, then work my way back to have a Taiwanese script.

Hellstorm · August 21, 2014, 8:55am

[quote=“hansioux”]
Because I see no proposal of how to separate む as just m or as mu. [/quote]

Why not add a diacritic? む゛ oder something like that? Or use a small letter like they do in Ainu: ㇺ. Scripts do not have copyright (except Klingon), you can do whatever you want with them.

Ainu heavily modifies Katakana, because it has more phonetics than Japanese does. en.wikipedia.org/wiki/Ainu_language#Writing
You could do that for your scripts as well.

But either way, why not just create something completely new, which is much more logically than the latin alphabet? For example, easily rememberable ordering, logically changing letters etc. There are many scripts in the world which are more logical than the latin alphabet, why not take some inspiration of them? And I guess you have to be realistic, the chances of anyone adopting your alphabet are slim to none either way.

hansioux · August 21, 2014, 8:59am

[quote=“Hellstorm”][quote=“hansioux”]
Because I see no proposal of how to separate む as just m or as mu. [/quote]

Why not add a diacritic? む゛ oder something like that? Or use a small letter like they do in Ainu: ㇺ. Scripts do not have copyright (except Klingon), you can do whatever you want with them.
[/quote]

even if you use the small kana for just /m/, there are still tons of Austronesian phonology that Kana simply wasn’t designed to cover. /β/, /ʈ/, /ɖ/, /ɭ/,/ɾ/ to name a few.

It’s all well and fun when small letter kana are already submitted to the Unicode Consortium by the Japanese government for you, totally a different matter when you want to add new glyphs to kana where no such usage exists in Japan.

Hellstorm · August 21, 2014, 9:08am

[quote=“hansioux”]
even if you use the small kana for just /m/, there are still tons of Austronesian phonology that Kana simply wasn’t designed to cover. /β/, /ʈ/, /ɖ/, /ɭ/,/ɾ/ to name a few.

It’s all well and fun when small letter kana are already submitted to the Unicode Consortium by the Japanese government for you, totally a different matter when you want to add new glyphs to kana where no such usage exists in Japan.[/quote]

My point is that you don’t have to care what Kana was designed to cover for. You can just use and change it however you like, in order to adjust it to the phonology. That is nothing different than if you used latin alphabet. (I am in no way implying you should use Kana, I find it as equally inuseful. But I find your argumentation that Kana is not designed for Austroniesian phonology weird, if you do not think of the Latin alphabet like that.)

You could create a font for the PUA, with eventual integration into the Unicode standard. That is the preferred way for new scripts to enter Unicode. And if you do that consistently from the beginning (meaning a good Open-Source font, a well designed keyboard layout, everything installable in one package, using web fonts for homepages) this should pose only small problems for usage.

Hokwongwei · August 21, 2014, 9:11am

Arabic would be a bad choice because Arabic-speakers can’t seem to agree on how it’s supposed to be read. Cyrillic would work fine.

Both of these are way better options than Kana because Hiragana and Katakana are syllabaries, not alphabets. The kana system presumes a limited set of syllables in the language, as with Japanese, whereas alphabets give you building blocks that allow you to string together any syllable you can imagine. In other words, Japanese kana are by nature a single syllable to a single character. It’s 1:1, and so if you have sounds like mge that combine multiple initials or vowels into one syllable, you have a problem.

Kana is perfect for Japanese and could even be reworked for use in Korean and Chinese. But it won’t cut it for aboriginal languages without heavy modification. And there’s not much point in asking people to relearn kana for a very little-spoken language when an alphabet can get the job done effectively already.

Latin is, by the way, the obvious choice because practically everyone in the world is familiar with Latin letters. Cyrillic would work but anyone from outside of the Cyrillosphere would have to learn a new grammar, new pronunciation, and a new script.

Hellstorm · August 21, 2014, 9:15am

[quote=“Hokwongwei”]
Latin is, by the way, the obvious choice because practically everyone in the world is familiar with Latin letters. [/quote]

Except for Taiwanese scnr

Well, in that logic why not use latin script for everything? Of course I know that latin is of course easiest, but it has several flaws in my opinion. If you create something which won’t be adopted either way, why then not create something logical?

Hokwongwei · August 21, 2014, 9:29am

The Latin alphabet may be flawed, but it’s also the basis for the IPA, which is to date the best system for an objective transcription of all of the world’s different phonemes. A dumbed-down IPA (getting rid of diacretics and inscrutably odd characters) is a very good option for any language without a consistent writing system, like the indigenous languages of Taiwan.

sofun · August 21, 2014, 7:21pm

Each kana by itself can be pronounced as one syllable, just like A, B, C, D, etc. But when they appear together to create a word, there are designated rules, explicit or implicit, depending on the spoken language. That’s all I’m going to say about kana in this thread. Hellstorm can check out the new thread I created in forumosa.com/taiwan/viewtopi … q#p1625909.

sofun · August 21, 2014, 7:45pm

For the consonants, more than one(1) of them can share a single letter.
For the vowels, more than one(1) of them can share a single letter.

hansioux · August 22, 2014, 2:31am

[quote=“Hellstorm”]

My point is that you don’t have to care what Kana was designed to cover for. You can just use and change it however you like, in order to adjust it to the phonology. That is nothing different than if you used latin alphabet. (I am in no way implying you should use Kana, I find it as equally inuseful. But I find your argumentation that Kana is not designed for Austroniesian phonology weird, if you do not think of the Latin alphabet like that.)[/quote]

The key difference between Kana and Latin alphabets is that each Kana is a syllabogram, where as latin letters doesn’t have to represent a full syllable. That makes latin letters inherently a better system to represent languages such as Austronesians, Taigi and Hakka. There were already attempts of using Kana to annotate these languages back during the Japanese era, so if Kana has to be the only choice, those systems should take priority, as they have attempted to take care of consonants without vowels issues. However, it would be very difficult to have an unified Kana system for all native languages because it simply doesn’t have the range, unless more diacritics are used. At that point I feel advocates are just forcing the Kana system into these languages and not really using it because it is suitable.

By the way, even if Taiwan do adopt Kana to officially write everything, the need of a good (hopefully unified) Romanization remains.

The Aboriginals themselves are also demanding to register their names in Latin alphabets:

tw.news.yahoo.com/%E6%88%B6%E6% … 33488.html

[quote=“Hellstorm”]
You could create a font for the PUA, with eventual integration into the Unicode standard. That is the preferred way for new scripts to enter Unicode. And if you do that consistently from the beginning (meaning a good Open-Source font, a well designed keyboard layout, everything installable in one package, using web fonts for homepages) this should pose only small problems for usage.[/quote]

creating a font for your own use is fine, having it be a standard and be in unicode and have all the fonts that claims to support unicode add these glyphs is an other matter.

hansioux · August 22, 2014, 2:51am

[quote=“sofun”][quote=“hansioux”]

So right now I have a total of 45 consonants and 16 vowels and 9 tones.
[/quote]

For the consonants, more than one(1) of them can share a single letter.
For the vowels, more than one(1) of them can share a single letter.[/quote]

yes, and if I do ever get around to the phoneme list, I am sure we can merge more letters, such as using ts and tsh to represent /ts/ and /tsh/, use tsi and tshi to represent /tɕ/ and /tɕh/

so far that looks plausible.

hansioux · August 22, 2014, 3:53am

Just updated the chart with Altayal phonology.

Altayal added 3 new consonants and 0 new vowels to the Holo-Hakka-Mandarin-Amis phonology chart.

New consonants are:
/q/
/β/
/ɣ/

So right now I ave a total of 48 consonants and 16 vowels and 9 tones.

Paiwan is next

sofun · August 22, 2014, 4:30am

You’d want to keep the char_length of a consonant to <3.

sofun · August 22, 2014, 4:52am

Consonants and vowels spoken by human beings are in a continuous space, whereas the letters used in romanizations are in discrete space. What I mean is that we’re essentially “sampling” continuous signals into discrete signals. This is an A/D process, in signal processing jargon.

The reading process is actually to reconstruct the discrete signals back into continuous space (D to A). Now, before the conversion, readers of different ethnicities are already equipped with their own built-in digital filters. This is very important, because this means you can make your A/D processor(i.e., the subject of your OP) even more efficient than you originally thought you could.

The different filters allow different readers to treat the same written word either as “signal” or “noise (i.e. ignored)”

So in your design process, you should balance precision against *cost. You only need the system be “adequately precise”, and leave the readers to leverage its own built-in digital filter. In fact, you want the system to be low-cost so that the readers(i.e. users) are given incentives to adopt your system.

*Cost can mean a lot of things. But here simply treat Char_Length as the salient factor in Cost. Hence I said keep consonant.char_length <3.