Name Transliteration from ID Document
Convert non-Latin names from identity documents into standardized Latin-script fields for matching, AML screening, and integrations.
Overview
Use name transliteration to convert non-Latin names from identity documents into standardized Latin-script fields for matching, AML screening, KYC checks, and downstream systems.
IDWise keeps both versions of the name when the original document uses a supported non-Latin script. Native fields preserve the original document text, while Latin fields store the transliterated value used for matching and interoperability.
How It Works
IDWise applies transliteration during identity document processing.
-
Detect the script
IDWise detects the script of each name field using language and character set identification.
-
Decide whether transliteration is needed
- If the input is already in Latin script, IDWise stores it as-is and does not transliterate it.
- If the input is in a supported non-Latin script, IDWise transliterates it into Latin script.
-
Store native and Latin fields
Field Description First Name Native, Last Name Native, Full Name Native Original name values from the document in their native script. First Name, Last Name, Full Name Latin-script name values used for matching, screening, and integrations.
This dual representation lets you display localized names to users while using standardized Latin values for matching and regulatory screening.
Examples
| Input | Output | Mapped fields |
|---|---|---|
| ู ุญู ุฏ ุนูู | Mohammed Ali | Full Name = Mohammed Ali, Full Name Native = ู
ุญู
ุฏ ุนูู |
| ูุงุทู ุฉ ุงูุฒูุฑุงุก | Fatima Al Zahraa | Full Name = Fatima Al Zahraa, Full Name Native = ูุงุทู
ุฉ ุงูุฒูุฑุงุก |
| John Smith | John Smith | Stored only in Latin fields; native fields remain empty. |
Transliteration Models
IDWise supports two transliteration models:
| Model type | Description | Supported languages |
|---|---|---|
| Dictionary-based model | Uses linguistic rules and curated mappings for high precision on person names. | Arabic (ar) only |
| Machine learning model | Uses a machine learning model trained on multilingual name datasets for contextual transliteration. | All supported non-Latin languages |
To enable the machine learning-based transliteration model, contact [email protected].
Supported Languages
| Code | Language | Description |
|---|---|---|
ar | Arabic | Supported with both dictionary-based and machine learning models. |
ru | Russian | Cyrillic script transliterated to Latin using ISO-9. |
zh-cn | Chinese | Transliterated using pinyin-style mappings. |
ja | Japanese | Uses Hepburn-style transliteration for kana and kanji. |
ko | Korean | Uses standard Romanization of Hangul. |
hi | Hindi | Devanagari script transliterated to Latin. |
bn | Bengali | Script converted to Latin using phonetic mapping. |
ur | Urdu | Arabic-based script transliterated using the machine learning model. |
fa | Persian (Farsi) | Arabic-based script handled using the machine learning model. |
ps | Pashto | Arabic-based script transliterated phonetically. |
ku | Kurdish | Arabic-based or Latin script, depending on source. |
th | Thai | Thai script transliterated to Latin. |
