At the Google I/O 2022 on May 11, the tech giant announced that it has added 24 new languages including Sanskrit and seven new Indian languages, to Google Translate, taking the number to 133 languages around the globe. The company said that over 300 million people speak these newly added languages in India, Africa, Nepal and more.
Apart from Sanskrit, the other Indian languages in the latest iteration of Google Translate are Assamese, Bhojpuri, Dogri, Konkani, Maithili, Mizo and Meiteilon (Manipuri), taking the total number of Indian languages supported by the Google translation to 19.
The Internet Giant said these are also the first set of languages it has added using a translation technique called Zero-Shot Machine Translation, where a machine learning model only sees monolingual text — meaning, it learns to translate into another language without ever seeing an example.
“There is a long tail of languages that are underrepresented on the web today and translating them is a hard technical problem since translation models are usually trained with bilingual text. However, there is not enough publicly available bilingual text for every language” Alphabet chief executive Sundar Pichai said in his keynote speech on May 11.
“With advances in machine learning, we have developed a monolingual approach where the model learns to translate a new language without ever seeing a direct translation of it” Pichai said.
Here is the full list of new languages added to Google Translate
- Sanskrit (used by about 20,000 people in India)
- Assamese (used by about 25 million people in Northeast India)
- Bhojpuri (used by about 50 million people in northern India, Nepal and Fiji)
- Dogri (used by about three million people in northern India)
- Konkani (used by about two million people in Central India)
- Maithili (used by about 34 million people in northern India)
- Meiteilon (Manipuri) (used by about two million people in Northeast India)
- Mizo (used by about 830,000 people in Northeast India)
- Aymara (used by about two million people in Bolivia, Chile and Peru)
- Bambara (used by about 14 million people in Mali)
- Dhivehi (used by about 300,000 people in the Maldives)
- Ewe (used by about seven million people in Ghana and Togo)
- Guarani (used by about seven million people in Paraguay and Bolivia, Argentina and Brazil)
- Ilocano (used by about 10 million people in northern Philippines)
- Krio (used by about four million people in Sierra Leone)
- Kurdish (Sorani) (used by about eight million people, mostly in Iraq)
- Lingala, (used by about 45 million people in the Democratic Republic of the Congo, Republic of the Congo, Central African Republic, Angola and the Republic of South Sudan)
- Luganda (used by about 20 million people in Uganda and Rwanda)
- Oromo (used by about 37 million people in Ethiopia and Kenya)
- Quechua (used by about 10 million people in Peru, Bolivia, Ecuador and surrounding countries)
- Sepedi (used by about 14 million people in South Africa)
- Tigrinya (used by about eight million people in Eritrea and Ethiopia)
- Tsonga (used by about seven million people in Eswatini, Mozambique, South Africa and Zimbabwe)
- Twi (used by about 11 million people in Ghana)
READ MORE: India’s First Sanskrit Animation film