Meta's Omnilingual MT for 1,600 Languages (ai.meta.com)

by j0e1 50 comments 136 points
Read article View on HN

50 comments

[−] stingraycharles 56d ago
I find that meta’s translations are very poor compared to others, at least for relatively obscure languages, which I figured was relevant considering the article.

Google Translate is a good default, but LLMs are really good at translations, as they’re better capable at understanding context and providing culturally appropriate translations.

(I live in Cambodia where they speak Khmer)

[−] djsamseng 56d ago
Hello from Siem Reap, Cambodia! Awesome to see a fellow tech enthusiast from Cambodia.

I actually found Facebook’s translations pretty good (better than Google Translate for things longer than a sentence). From my understanding of Khmer, Khmer is a bit more verbose and context dependent, hence LLMs in Khmer would be a big help understand those nuances.

In the inverse case (LLMs generating khmer from English) I heard from locals that it sounds formal and “robotic” which I found quite interesting.

[−] pseudocomposer 56d ago
Kagi Translate is fantastic. Multilingual support is honestly one of the best things about LLMs, imo.
[−] ks2048 56d ago
So, LLMs are noticeably better in Khmer than Google Translate? I wonder why Google Translate doesn't use Gemini under-the-hood. Perhaps it's more prone to hallucinations.

I'm interested in find some thorough testing of translations on different LLMs vs Translation APIs.

[−] pattilupone 56d ago
There's a dropdown on Google Translate that lets you choose "Advanced" mode or "Classic" mode. Advanced mode uses Gemini but it's only available for select languages.
[−] yellow_lead 56d ago
It's not even good for Chinese
[−] smallerize 56d ago
*they're

(Sorry I had to)

[−] stingraycharles 56d ago
I could have sworn I edited it! I did notice myself as well, but thanks for the correction.
[−] tomrod 56d ago
*ពួកគេគឺជា
[−] gojomo 56d ago
Can translate between 1600 languages.

Can't achieve subject-verb agreement in 1st sentence of their English abstract.

Advances made through No Language Left Behind (NLLB) have demonstrated that high-quality machine translation (MT) scale to 200 languages.

[−] vgivanovic 54d ago
Huh? I'm a native English speaker and the sentence looks OK.

Advances have demonstrated... The NLLB part is an adjectival(sic) phrase that modifies the noun "Advances".

Hopefully I'm not wrong...

[−] sajforbes 54d ago
It was a needlessly snarky way to word it, but they are right. The issue is the verb 'scale/s' rather than 'advance/s'
[−] ks2048 56d ago
I'll be looking at this in detail. I've started a company to do similar things, https://6k.ai

I'm currently concentrating on better data gathering for low-resource languages.

When you look in detail at data like Common Crawl, finepdfs, and fineweb, (1) they are really lacking quality data sources if you know where to look, and (2) the sources they have are not processed "finely" enough (e.g. finepdfs classify each page of PDF as having a specific language, where-as many language learning sources have language pairs, etc.

[−] djoldman 56d ago
Just spent a long time trying to find where you can download any of these weights.

Is it open weight? If so, why isn't there just a straight link to the models?

[−] intended 56d ago
Didn’t research show that models get worse at translation the more languages get added in? The curse of multilinguality? Lauscher 2020?

It looks like meta found a way forward.

Reading meta’s abstract, it seems that they have found ways to improve the quality of the training data, and also new evaluation tools?

They are also saying that OMT-LLaMA does a better job at text generation than other baseline models.

[−] garyclarke27 56d ago
They can translate 1600 languages, but they cannot do basic text formatting, where are the paragraphs?
[−] pxtail 55d ago
Where are real, useful features, why in 2026 can't I get transcript of voice messages in my chat?
[−] croes 56d ago
Off topic, since the AI craze MS‘ documentation translation has ridiculous errors like translating try catch keywords to "versuchen" and "fangen" for German pages
[−] psychoslave 56d ago
That's a high count, but still a bit away from "Omni". Usual count is between 4k and 8k depending the source. But the first 1k might be the hardest, certainly.
[−] ks2048 56d ago
Another interesting thing mentioned here is: BOUQuET: Benchmark and Open-initiative for Universal Quality Evaluation in Translation.

https://huggingface.co/spaces/facebook/bouquet

[−] mrlonglong 55d ago
It can't even do decent Welsh to English translations.
[−] asveikau 55d ago
They've come a long way since enabling Burmese genocide citing lack of available translations.
[−] lzhgusapp 55d ago
[dead]
[−] ath3nd 56d ago
[dead]
[−] rowanseerwald 56d ago
[dead]