Austronesian etymology and loanwords

I got one more. The Sinitic word 竹. 竹 is tik in Taigi, trjuwk in Middle Chinese, *truk in Old Chinese. It is a loanword from Kra-Dai’s *b.twuːkᴰ, originally meaning a strip of bamboo, cognate with ตอก (dtɔ̀ɔk, “bamboo strip”) in Thai. In Proto-Kam-Sui the word is *thruk7 ‘bamboo strip’.

Since Sagart claimed Kra-Dai is a daughter language of Proto-Austronesian, and a sister language of Malayo-Polynesian, it would ultimately mean the Chinese word bamboo came from Taiwan.

So I went looking for what that word is supposed to be in Austronesian languages.

In Paiwan, the word for a strip of split and flattened bamboo is taqtaq, and adding the m would make t-m-aqtaq into a verb meaning splitting bamboo.

It ultimately came from the root -tak/-taqak, meaning the sound of splitting or cracking something. Related words are *betak meaning to split or crack, which is cognate with biták in Tagalog.

It would make sense that this word is a loan into Old Chinese, at a time when they still created new characters for new things. There would have no bamboo around the bend of the yellow river, and bamboo wear that made its way to that region probably were weaved with bamboo strips.

OC PAN Proto-Kra-Dai Proto-Kam-Sui Thai Comments
bamboo strip *truk *taqtaq *b.twuːkᴰ *thruk7 dtɔ̀ɔk bamboo in Sinitic

Eh… just realized the moon goddess 嫦娥 (*dangŋˤajʔ) sounds extremely similar to the Pangcah Amis moon goddess Dongi.

Something I just thought of blew my mind…

The famous sword making warring state country Yue (越), whose swords used to be mere legends, but with the unearthing of the Sword of Goujian, which remained sharp after at least 2500 years have been proven to be historical facts. The locations of Yue (越) and culturally similar Wu (吳) used to be Kra-Dai speaking. So I was trying to see if there are any hints in the names of the states or kings.

Turns out, the tribal name (氏) of the kings of Yue is 諸稽, which in Old Chinese would have be *ta-kˤij.

I think it came from the Austronesian word *takis, which meant… sword.


So 7 in some Formosan languages and rest of Austronesian languages should be 5+2 then?

Yes, pitu is abbreviated from RaCep+pi+tuSa, 5 and 2.

It is obscured because RaCep was replaced by lima.

I came across this twitter post and it seems there could be a great leap with AI towards rare complicated languages translation and interpretation.

https://twitter.com/hahahahohohe/status/1765088860592394250

I threw my short possible OC-PIE cognate list from this thread at Claude, with only the Hanji, OC, PIE, and English columns. I didn’t provide the column containing my comments, or the OE, ME columns.

I asked it for more possible cognates, and it hallucinated. However, when I asked it to provide possible cognates for the characters with an empty English cell… well, it got interesting.

For the empty cells in the English column, here are some potential English cognate words based on the Proto-Indo-European (PIE) forms provided:

先 *sˤər
The PIE root is not given, but this could be related to the PIE root *ser- ‘to line up, arrange’. A possible English cognate would be “series”.

昭 *taw
照 *taw-s

朝 *taw
These Old Chinese forms correspond to the PIE root *dyeu ‘to shine’. Possible English cognates derived from this PIE root include “day”, “dawn”, and “jovial” (from the name of the Roman god Jupiter < *dyeu-pater ‘sky father’).

So potential English cognate entries could be:

先 *sˤər | *ser- | series

昭 *taw | *dyeu | day, dawn
照 *taw-s | *dyeu | day, dawn

朝 *taw | *dyeu | day, dawn

Note that these are speculative suggestions based on the limited data provided. Establishing robust etymological links requires much more comprehensive evidence and analysis by historical linguists.

Of course, it could be just referencing the English etymology training data that it saw before and not really doing anything with the OC data I provided.

So I asked about the cells which I didn’t provide any possible PIE cognates, mostly because I couldn’t think of one at the time of compiling that list.

Here are the possible English cognates for 路, 客, and 旅 based on the information provided:

路 (*Cə.rˤak-s in Old Chinese):
There is no Proto-Indo-European reconstruction given, so an English cognate is not clear from this data. The Old Chinese form may be related to words meaning “road” or “path”.

客 (*kʰˁrak in Old Chinese):

Again, no Proto-Indo-European form is listed, but the Old Chinese reminds of the English “caravan” from the Proto-Indo-European root *ker- “to go”.

旅 (*[r]aʔ in Old Chinese):
No Proto-Indo-European form is provided for this Old Chinese reconstruction either. However, it may be related to the root *lē- “to go, travel”, which gave rise to English words like “lore” (knowledge from traveling) and “less” (one who travels slower).

So in summary, while direct English cognates are not obvious from the data provided, the Old Chinese forms suggest possible connections to words related to traveling, roads, and going on journeys in English, but firmer conclusions cannot be made without the corresponding Proto-Indo-European reconstructions.

Ok, now I’m scared. Those actually make sense, except for the one for 客, that PIE -ker for caravan is a hallucination.

Obviously I didn’t pay for Claude OPUS, and just used the free Claude 3. So maybe the professional version could do even more.

Here a reply from Opus

I can’t really comment on those replies. Hope those are helpful @hansioux

Most of them are hallucinations, like sure 日’s OC is *C.nik, but day ultimately came from the PIE root *dhegh for to burn or heat. Even if OPUS got the etymology for day right, these two don’t look they are related. The same for 吾, they just don’t look related.

Although, Claude OPUS definitely has read books in PIE and OC before. It can recall most of these reconstructions, even if it doesn’t consistently stay within one system.

Just learned why tomatoes are called kam-á-bi̍t (柑仔蜜) in Taigi. Apparently it was originally called kam-á-ti̍t (柑仔得), and the etymology is as such:

tomātl (Nahuatl) → tomate (Spanish) → tamatis (Cebuano) →kamatis (Tagalog) → kam-á-ti̍t (original loan in Taigi) → kam-á-bi̍t (current popular Taigi)

So not exactly an Austronesian loan into Taigi, but not exactly not either.

So I actually tried to look for evidence of this when I first posted this. I think I’ve found evidence of people using *Nem as 3 in the languages not in the Limaish family, such as Pazeh, and in some Sirayaic languages, which is supposedly in the Enemish family.

Taivoan and Makatao, which are Sirayaic languages, *Nem went through some sound changes, *Nem > nom > lom > rom. So we get ra-ruma for 3 and lumta for 4 (lum 3 + ta 1) in Makatao.

There is also records of pinum for 3 in Basay. Although plenty of them also use the *telu form of 3.

So in terms of numeral innovations, replacing 6 with *NemNem was probably the first one to happen, followed up with replacing 3 *Nem with *telu, followed up by Pituish and Walu-Siwaish, since we still need *RaCep for 5 to form those abbreviations, then we get Limaish and Puluqish.

Great to see a video laying the evolution out.

Great video about the history of Austronesian linguistic study.

Video on Pazih or Pazeh from ILoveLanguages!, which is a youtube channel ran by a Filipino linguist.

The number system is almost identical to Kaxabu.

My thought on the word Māori’s connection with Taiwan’s Indigenous languages.

Word Language Meaning
*qulid PAN life, alive
'orip Pangcah life
q-m-uzi-quzip Paiwan living beings (animals)
hidup Malay life
urip Javanese life
*ma-qudip PMP living, soul, spirit
m-urip Kayan alive, to sprout, reside at
*maqoli POC true, genuine
m-ori Bimanese living, alive
mauli Suau life
mo-ʔui Tongan to live, be living or alive
ma-uli Samoan seat of the emotions
Ma-uli Hawaiian life, heart, seat of life, spirit
Ma-uri Maori normal

What about Cook Islands Māori?

So I just saw somewhere that claimed harimau came from *qari-maquŋ, which came from wiktionary. Not sure if the *qari- prefix is the same as *qali-, also don’t know hwere the maquŋ came from.

The word for Formosan clouded leopard is *lukeNaw and remains so in most Formosan language.

Saw this chart of how “eat chicken” evolved from PAN to modern Kra-Dai and Austronesian languages.

More Kra-Dai and Austronesian cognates:

Gloss Proto-Kra-Dai Proto-Kra Proto-Austronesian Comment
1 *(C)itsɤː *tʂəm *isa
2 *saː *sa *duSa
3 *tu *telu
4 *pə *Sepat
5 *r-ma *lima
6 *x-nəm *enem
7 *t-ru *pitu
8 m-ru *walu
9 *s-wa *Siwa
10 *pwlot *sa-puluq
back *lon *biliN walk behind
boat *Cuɖá *aluja oar
bear *kəˀmˠúj *C-me *Cumay
before *kun *(q)uNah
behind *lon *biliN walk behind
bird *ɳok *manuk
black *hl/dəm *tidem
blood *pɤlaːc *plat *daRaq
blood *kɣa *daRaq
boil *plaɯ *baʀəq
bone *dək *CuqelaN
borrow *C-ɟáːm *Sezam
chest *tək *daSdaS
child *lak *aNak
crab *d-rat *gaRaŋ
cut *te *tebteb
day *(h)wən *waRi
die *maTaːj *ma-aCay
dry in sun *m-ʈak *betak crak, split open (as dry ground).
ear *k-ra *Caliŋa
eat *kan *kaen
excerment *kai *Caqi
eye *maTaː *m-ʈa maCa
fall *tok *taktak
fart *C-tot *qetut
fire *(C)apuj *pui *sapuy
flea *x-mət *qatimela
flower *hŋa *buŋa
front *kun *(q)uNah
fruit *C-mak *buaq
grandmother *ja *aya
hair *mot *gumuN body hair
hair *mot *kumiS public hair
hair *mot *bukəS head hair
hand *(C)imɤː *mja *(qa)lima
head *krai *qulu
head louse *KuTuː *C-ʈu *kuCux
horn *C-ku *sequŋ
I (fst p singular) *akuː *ku *aku
intestines *C-si *Cinaqi
leg *C-ka *qaqay
month *x-ŋo *qaNeb
moon *m-ɖjan *bulaN
mouth *ŋuŋ *ŋusu
navel *m-ɖaɯ *puja
navel *m-ɖaɯ *pujek
nose *teŋ *mujiŋ
pig *x-mu *babuy
pluck *bet *putput
rain *jəl *quzaN
rat *hlai *labaw
raw *(k-)ɖep *qudip
sand *hŋai *qenay
sand *hŋai *bunaj
see *kai *kita
see *ti *kita
sesame *l-ŋa *leŋa
shellfish *tʃui *ku(S)ul
shoulder *m-ba *qabaRa
skin *boŋ *baŋeS
snake *ŋa *Sulaʀ
sun *(l-)wən *waRi
taro *pəˀrˠáːk *p-ɣak *biRaq
that *ʔ-ɲa *-na
this *ʔ-ni *-ni
tongue *(C)əmaː *I-ma *Sema
tooth *lipan *I-pən *nipen
water *(C)aNam *daNum
weep *ɲit *Caŋis
who *ʔ-nau *na-nu what
you (snd p singular) *isuː; amɤː *mə *k-amu

So I followed your lead and looked up harimau on wiktionary. Then followed the link to *qari-maquŋ. The data source was listed so I followed that link and found a very nice resource. ACD combined with openstreetmap to show cognate sets on a map. You probably already know of it, but I thought I’d share it here because it really tied everything together for me visually.

Here’s a screenshot:

The map for *pitu₁ (seven, or 5+2) is really cool (below is just a screenshot, hover over the dots on the actual map and it shows the language/dialect and the cognate word):