Name Transliteration from ID document

Overview

The Transliteration feature in IDWise automatically converts non-Latin names into standardized Latin script to enable accurate matching and interoperability across AML, KYC, and identity verification systems.

It ensures that multilingual names, especially from Arabic, Urdu, Persian, or Cyrillic scripts, are represented in both native and Latin forms consistently within the journey data model.

How It Works

Script Detection IDWise first detects the script of each name field using language and character set identification.
Conditional Transliteration
- If the input is already in Latin, it is not transliterated and is stored as-is.
- If the input is in a non-Latin script, it is transliterated into Latin and stored in dedicated Latin fields.

Field Mapping

Field	Description
First Name Native, Last Name Native, Full Name Native	Contain the original name in its native (non-Latin) script.
First Name, Last Name, Full Name	Contain the Latin-transliterated version of the user’s name.

This dual representation allows clients to display localized names while using the Latin versions for matching and regulatory screening.

Example

Input (Arabic)	Output (Latin)	Mapped Fields
محمد علي	Mohammed Ali	`Full Name = Mohammed Ali` , `Full Name Native = محمد علي`
فاطمة الزهراء	Fatima Al Zahraa	`Full Name = Fatima Al Zahraa`, `Full Name = فاطمة الزهراء`
John Smith	John Smith	Stored only in Latin fields; native fields remain empty

Models Used

IDWise supports two transliteration models:

Model Type	Description	Supported Languages
Dictionary-Based Model	Uses linguistic rules and curated mappings for high precision on person names.	Arabic (`ar`) only
Machine Learning Model	ML model trained on multilingual name datasets for contextual transliteration accuracy.	All supported non-Latin languages

📘
To enable the machine learning–based transliteration model, please contact our support team at [email protected]

Supported Languages

Code	Language	Description
ar	Arabic	Supported with both dictionary-based and ML models
ru	Russian	Cyrillic script transliterated to Latin using ISO-9
zh-cn	Chinese	Transliterated using pinyin-style mappings
ja	Japanese	Uses Hepburn-style transliteration for kana/kanji
ko	Korean	Uses standard Romanization of Hangul
hi	Hindi	Devanagari script transliterated to Latin
bn	Bengali	Script converted to Latin using phonetic mapping
ur	Urdu	Arabic-based script transliterated using ML model
fa	Persian (Farsi)	Arabic-based script handled via ML model
ps	Pashto	Arabic-based script, phonetic transliteration
ku	Kurdish	Arabic-based or Latin depending on source
th	Thai	Transliteration from Thai script to Latin