Facebook's Hokkien-English translator models

6 Likes

So there are 4 models, a set of Taigi to English and English to Taigi models trained using UnitY, and another two with S2UT (speech-to-unit translation). Both UnitY and S2UT are transformer-based models that divide audio into smaller acoustic units and have the ability to translate without a text transcript for the input audio. In the case of UnitY, the input is first translated to Mandarin text, and then fed into the model again with Taigi audio and Mandarin transcripts to generate the English audio.

I tried them out and they are pretty accurate. For Taigi I said something like “學語言尚重要ê是環境” and got something like “The most important thing when learning a language is the learning environment”. For English I said “We are having lunch at the German bakery today” and got “阮今仔日佇德國pháng店食中晝”.

The only problem I have is that I could only play the result once, and the player for the result would be broken. I’m not sure if that’s just my Firefox on my phone, or if it’s broken like that for every browser.

Just saw this:

This would be really cool to see if it works. They mention multiple times Hokkien not having a written form but I thought Romaji was the most standard. It seems they’re using texts in Mandarin to fuel building the model which is a bit odd.

1 Like

This is pretty amazing. First time I’ve ever said that about something Facebook has done.

1 Like

Well it’s Meta now

A well-known Taigi scholar Phuann Kho-guân also tried out the models, and gave it a fairly positive review.

These are the sentences that he tried:

「我真佮意–汝,汝有佮意–我無?」
I like you very much. Do you like me?

「阿伯,gâu早!出來運動–hinnh?」
Uncle, good morning. Are you going to the game?

「即號頭路 sí-pē硬斗,我毋做!」
And this job is hard to do but I can’t do it.

The second sentence translated 運動 ūn-tōng as going to the game, which would usually mean going to exercise instead.