Name Transliteration from ID document
Overview
The Transliteration feature in IDWise automatically converts non-Latin names into standardized Latin script to enable accurate matching and interoperability across AML, KYC, and identity verification systems.
It ensures that multilingual names, especially from Arabic, Urdu, Persian, or Cyrillic scripts, are represented in both native and Latin forms consistently within the journey data model.
How It Works
-
Script Detection IDWise first detects the script of each name field using language and character set identification.
-
Conditional Transliteration
- If the input is already in Latin, it is not transliterated and is stored as-is.
- If the input is in a non-Latin script, it is transliterated into Latin and stored in dedicated Latin fields.
-
Field Mapping
Field Description First Name Native, Last Name Native, Full Name Native Contain the original name in its native (non-Latin) script. First Name, Last Name, Full Name Contain the Latin-transliterated version of the user’s name. This dual representation allows clients to display localized names while using the Latin versions for matching and regulatory screening.
Example
Input (Arabic) | Output (Latin) | Mapped Fields |
---|---|---|
محمد علي | Mohammed Ali | Full Name = Mohammed Ali , Full Name Native = محمد علي |
فاطمة الزهراء | Fatima Al Zahraa | Full Name = Fatima Al Zahraa , Full Name = فاطمة الزهراء |
John Smith | John Smith | Stored only in Latin fields; native fields remain empty |
Models Used
IDWise supports two transliteration models:
Model Type | Description | Supported Languages |
---|---|---|
Dictionary-Based Model | Uses linguistic rules and curated mappings for high precision on person names. | Arabic (ar ) only |
Machine Learning Model | ML model trained on multilingual name datasets for contextual transliteration accuracy. | All supported non-Latin languages |
To enable the machine learning–based transliteration model, please contact our support team at [email protected]
Supported Languages
Code | Language | Description |
---|---|---|
ar | Arabic | Supported with both dictionary-based and ML models |
ru | Russian | Cyrillic script transliterated to Latin using ISO-9 |
zh-cn | Chinese | Transliterated using pinyin-style mappings |
ja | Japanese | Uses Hepburn-style transliteration for kana/kanji |
ko | Korean | Uses standard Romanization of Hangul |
hi | Hindi | Devanagari script transliterated to Latin |
bn | Bengali | Script converted to Latin using phonetic mapping |
ur | Urdu | Arabic-based script transliterated using ML model |
fa | Persian (Farsi) | Arabic-based script handled via ML model |
ps | Pashto | Arabic-based script, phonetic transliteration |
ku | Kurdish | Arabic-based or Latin depending on source |
th | Thai | Transliteration from Thai script to Latin |
Updated about 13 hours ago