July 3, 2024

News Collective

Complete New Zealand News World

How to create artificial intelligence for different forms of Arabic – Samsung Newsroom Mexico

How to create artificial intelligence for different forms of Arabic – Samsung Newsroom Mexico

Stories from the Middle East about the complexity of creating AI tools for Arabic, a language with many facets

Galaxy artificial intelligence It now supports 16 languages, helping more people reduce language barriers with instant on-device translation. With these developments, Samsung has begun a new era of mobile artificial intelligence (AI), so we visit Samsung research centers around the world to learn how Galaxy AI came to be and what it will take to overcome the challenges of AI development. While the first part of the series dealt with the task of determining the data required, this part reflects the complex work of taking dialects into account.

Teaching a language to an AI model is a complex process, but what if it is not a single language, but a group of different dialects? This was the challenge faced by the team at Samsung Research and Development (R&D) Jordan (SRJO). Although “Arabic” was added as a language option for Galaxy AI features like live translation, the team had to study the different Arabic dialects that span the MENA region, each of which differs in pronunciation, vocabulary, and grammar.

Arabic is one of the six most widely used languages ​​in the world, and is used daily by more than 400 million people [1]. The language is divided into two forms: classical (Modern Standard Arabic) and colloquial (dialects of Arabic). Fusha is usually used in public and formal occasions, as well as in news events, while colloquial is used more in everyday conversations. Arabic is used by more than 20 countries, and there are currently about 30 dialects in the region.

See also  Cryptocurrency: What is the price of tera today?

Unwritten rules

The SRJO team, aware of the variants presented by these dialects, used a series of techniques to distinguish and address the unique linguistic features inherent in each. This approach was crucial to ensuring Galaxy AI was able to understand and respond in a way that accurately reflects regional nuances.

Unlike other languages, the pronunciation of the object in Arabic varies depending on the subject and verb of the sentence.“Explains Mohamed Hamdan, project manager of the Arabic language development team.Our goal is to develop a model that understands all these dialects and can respond in standard Arabic“.

TTS is a component of Galaxy AI’s Live Translation feature, which allows users to interact with people speaking different languages ​​by translating spoken words into written text and then reproducing them via audio. The TTS team faced a unique challenge due to the specificity of working in the Arabic language.

The Arabic language uses diacritics, which are clues to the pronunciation of words in some contexts, such as religious texts, poetry, and books for language learners. Diacritics are widely understood by native speakers, but are absent in everyday writing. This makes it difficult for the machine to convert raw text into phonemes, the basic units of sound that make up speech.

There is a lack of reliable, high-quality datasets that accurately represent the correct use of diacritics“Havilah explains.”We had to design a neural model that could predict and recover missing diacritics with great accuracy.“.

Neural models work similarly to human brains. To predict diacritics, the model must study several Arabic texts, learn the grammar of the language, and understand how words are used in different contexts. For example, the pronunciation of a word can vary greatly depending on the action or gender it describes. Intensive team training was key to improving the accuracy of the Arabic text-to-speech model.

See also  War ignites cyber attacks on banks

Improve understanding

The SRJO team also had to collect different audio recordings of dialects from different sources, which had to be transcribed, focusing on unique sounds, words and phrases. “We have assembled a team of native speakers of these dialects who know the nuances and variants well.says Aya Hassan, whose team was responsible for creating the database. “They listened to recordings and manually converted the spoken words into text“.

This work was instrumental in improving the automatic speech recognition (ASR) process so that Galaxy AI can handle a wide range of Arabic dialects. ASR is essential for Galaxy AI to understand and respond in real time.

Building an ASR system that supports multiple dialects in a single model is a complex tasksays Mohamed Hamdan, ASR manager on the project. “It requires a deep understanding of the complexities of language, careful data selection, and advanced modeling techniques.“.

The pinnacle of innovation

After months of planning, building, and testing, the team is ready to launch Arabic as a language option for the Galaxy AI system, allowing more people to communicate across borders. This unique team has made Galaxy’s AI services available to Arabic speakers, reducing linguistic and cultural barriers between them and people around the world. In doing so, it has created new good practices that can be expanded around the world. This success is just the beginning: the team continues to improve its models and improve the quality of Galaxy AI’s language capabilities.

In the next episode, we’ll go to Vietnam to see how linguistic data has been improved. Furthermore, what does it take to train an effective AI model?

See also  The industry increased its sales volume by 19% in June and is adding four months of promotions

Arabic is one of the languages ​​and dialects available in Galaxy AI and can be downloaded from the Settings app. Galaxy AI language features like instant translation and instant translator are available on Galaxy devices with Samsung’s One UI 6.1 update.[2].

[1] UNESCO, International Arabic Language Day 2023, https://www.unesco.org/en/world-arabic-language-day

[2] One UI 6.1 was first released on Galaxy S24 series devices with a wider rollout to other Galaxy devices including the S23, S23 FE, S22, S21, Z Fold5, Z Fold4, Z Fold3, Z Flip5, Z series Flip4, Z Flip3, Tab S9, and Tab S8.