Name Transliteration from ID document

Overview

The Transliteration feature in IDWise automatically converts non-Latin names into standardized Latin script to enable accurate matching and interoperability across AML, KYC, and identity verification systems.

It ensures that multilingual names, especially from Arabic, Urdu, Persian, or Cyrillic scripts, are represented in both native and Latin forms consistently within the journey data model.

How It Works

  1. Script Detection IDWise first detects the script of each name field using language and character set identification.

  2. Conditional Transliteration

    • If the input is already in Latin, it is not transliterated and is stored as-is.
    • If the input is in a non-Latin script, it is transliterated into Latin and stored in dedicated Latin fields.
  3. Field Mapping

    FieldDescription
    First Name Native, Last Name Native, Full Name NativeContain the original name in its native (non-Latin) script.
    First Name, Last Name, Full NameContain the Latin-transliterated version of the user’s name.

    This dual representation allows clients to display localized names while using the Latin versions for matching and regulatory screening.

Example

Input (Arabic)Output (Latin)Mapped Fields
محمد عليMohammed AliFull Name = Mohammed Ali , Full Name Native = محمد علي
فاطمة الزهراءFatima Al ZahraaFull Name = Fatima Al Zahraa, Full Name = فاطمة الزهراء
John SmithJohn SmithStored only in Latin fields; native fields remain empty

Models Used

IDWise supports two transliteration models:

Model TypeDescriptionSupported Languages
Dictionary-Based ModelUses linguistic rules and curated mappings for high precision on person names.Arabic (ar) only
Machine Learning ModelML model trained on multilingual name datasets for contextual transliteration accuracy.All supported non-Latin languages
📘

To enable the machine learning–based transliteration model, please contact our support team at [email protected]

Supported Languages

CodeLanguageDescription
arArabicSupported with both dictionary-based and ML models
ruRussianCyrillic script transliterated to Latin using ISO-9
zh-cnChineseTransliterated using pinyin-style mappings
jaJapaneseUses Hepburn-style transliteration for kana/kanji
koKoreanUses standard Romanization of Hangul
hiHindiDevanagari script transliterated to Latin
bnBengaliScript converted to Latin using phonetic mapping
urUrduArabic-based script transliterated using ML model
faPersian (Farsi)Arabic-based script handled via ML model
psPashtoArabic-based script, phonetic transliteration
kuKurdishArabic-based or Latin depending on source
thThaiTransliteration from Thai script to Latin