The Learning Curve, Part 2: How to Build an AI for Diverse Dialects

Technology

The Learning Curve, Part 2: How to Build an AI for Diverse Dialects

Tales from the Middle East on the complexity of creating AI tools for Arabic, a language with many facets

Samsung's Galaxy AI has broken new ground by supporting 16 languages, significantly reducing language barriers with real-time and on-device translation. This development is part of Samsung's broader initiative to harness mobile AI capabilities. To understand this advancement, we are visiting Samsung Research centers worldwide. While the first part of our series examined data needs, this segment delves into the intricate challenge of accounting for diverse dialects, particularly in the Arabic language.

Overcoming Dialect Challenges

Teaching AI a language is complex, but teaching it a collection of dialects is even more so. This was the task faced by the Samsung R&D Institute Jordan (SRJO) team when they integrated Arabic into Galaxy AI’s Live Translate feature. Arabic, spoken by over 400 million people daily, is divided into Modern Standard Arabic (Fus’ha) and numerous regional dialects (Ammiya). Each dialect varies in pronunciation, vocabulary, and grammar, presenting a significant challenge for the AI model.

Navigating Unwritten Rules

Mohammad Hamdan, project leader for Arabic language development, emphasized the unique hurdles: "Unlike other languages, Arabic pronunciation changes depending on the subject and verb in a sentence. Our goal is to develop a model that understands all these dialects and can respond in standard Arabic."

The Role of TTS in Galaxy AI

Text-to-Speech (TTS) is a crucial component of Galaxy AI’s Live Translate, converting spoken words into text and then vocally reproducing them. The TTS team faced a significant challenge with Arabic’s use of diacritics—guides for pronunciation absent in everyday writing. This gap makes it hard for machines to convert text into phonemes accurately.

Neural Models and Data Challenges

Haweeleh, a key member of the team, noted the shortage of high-quality datasets representing correct diacritic usage. The team designed a neural model to predict and restore missing diacritics with high accuracy. This required the model to study extensive Arabic text, learning the rules and contexts of word usage to improve TTS accuracy.

Enhancing ASR with Diverse Audio

The SRJO team also gathered diverse audio recordings of various Arabic dialects, which were transcribed to capture unique sounds, words, and phrases. Ayah Hasan, responsible for database creation, explained: "We assembled native speakers familiar with the nuances and variations of the dialects to transcribe the recordings."

This meticulous work was vital for improving the Automatic Speech Recognition (ASR) process, enabling Galaxy AI to understand and respond to different Arabic dialects in real-time. Hamdan highlighted the complexity: "Building an ASR system that supports multiple dialects in a single model requires deep language understanding, careful data selection, and advanced modeling techniques."

Reaching a Milestone

After extensive planning, development, and testing, Arabic was successfully added to Galaxy AI, allowing millions more to communicate across linguistic borders. This achievement not only benefits Arabic speakers but also sets new best practices for global implementation. The team continues to refine their models, enhancing Galaxy AI's language capabilities.

Looking Ahead

In our next installment, we will visit Vietnam to explore how Samsung improves language data and trains effective AI models. Arabic is just one of the languages now supported by Galaxy AI, available for download from the Settings app. Galaxy AI features like Live Translate and Interpreter are accessible on Galaxy devices running Samsung’s One UI 6.1 update.

This milestone marks a significant advancement in Samsung’s mission to break down language barriers, enabling seamless communication across cultures and region.