Facebook's Hokkien-English translator models

So there are 4 models, a set of Taigi to English and English to Taigi models trained using UnitY, and another two with S2UT (speech-to-unit translation). Both UnitY and S2UT are transformer-based models that divide audio into smaller acoustic units and have the ability to translate without a text transcript for the input audio. In the case of UnitY, the input is first translated to Mandarin text, and then fed into the model again with Taigi audio and Mandarin transcripts to generate the English audio.

I tried them out and they are pretty accurate. For Taigi I said something like “學語言尚重要ê是環境” and got something like “The most important thing when learning a language is the learning environment”. For English I said “We are having lunch at the German bakery today” and got “阮今仔日佇德國pháng店食中晝”.

The only problem I have is that I could only play the result once, and the player for the result would be broken. I’m not sure if that’s just my Firefox on my phone, or if it’s broken like that for every browser.