Catching up with its big rival Google, which is able to translate over 100 languages, Yandex can now translate 94.
However, Yandex is a translator with a difference: its software is able to translate some of the world's rarest languages thanks to a complex statistical model that can pick up linguistic patterns without a large body of bilingual texts.
One reason why Google and other translation services tend to provide translation of the world's most common languages, is because the translation software depends on having access to a corpus of texts in both languages. Common sources are the Bible or Koran, which have been translated into practically all languages.
While this is easy enough to find for languages such as Russian and English, finding these texts in less common languages is a more difficult task.
This was a challenge, because Papiamento is a relatively small language, a spoken by about 330,000 people in the Caribbean.
Since there are so few translations between Papiamento and other languages, the developers decided to try a different approach. They looked at other languages with similarities to Papiamento, in order to identify the relations between them and use that information to build a translator.
"We moved away from the traditional perception of each language as an independent system, and began to take into account the kinship between them. In practice, this means that if we need to build a translation for a language where there isn't much data, we can use other, larger, related languages," Yandex Translate developer Anton Dvorkovich explains in a blog post.
"Their individual models (morphology, syntax, vocabulary) can be used to fill the voids in the models of a 'small' language. This might just seem like blind copying of words and rules between languages, but the technology works a little smarter."
"This kinship can be different – for example, in Yiddish, most of the lexicon intersects with German and in Papiamento a lot is borrowed from Spanish and Portuguese. In the Tatar and Bashkir languages, there is similar syntax and morphology."
A Bashkir speaker, for example, can take a Wikipedia page in another language, translate it and then edit the results. The technology helps Bashkir speakers to increase their presence on the internet a lot faster than they would be able to otherwise.
In addition, Yandex hopes its statistical model will help linguists to better understand the relationships between languages.
"We expect that in the future, the technology we have developed for the use of data from related languages will be implemented on other areas and will generally help to better understand the links between languages, and consequently – more accurately translate texts. Therefore, we can say that this technology is not so much about 'small' languages as it is about establishing links between different languages of the world," Dvorkovich said.
Never miss a story again — sign up to our Telegram channel and we'll keep you up to speed!