Name Transliteration from ID Document

Convert non-Latin names from identity documents into standardized Latin-script fields for matching, AML screening, and integrations.

Overview

Use name transliteration to convert non-Latin names from identity documents into standardized Latin-script fields for matching, AML screening, KYC checks, and downstream systems.

IDWise keeps both versions of the name when the original document uses a supported non-Latin script. Native fields preserve the original document text, while Latin fields store the transliterated value used for matching and interoperability.

How It Works

IDWise applies transliteration during identity document processing.

  1. Detect the script

    IDWise detects the script of each name field using language and character set identification.

  2. Decide whether transliteration is needed

    • If the input is already in Latin script, IDWise stores it as-is and does not transliterate it.
    • If the input is in a supported non-Latin script, IDWise transliterates it into Latin script.
  3. Store native and Latin fields

    FieldDescription
    First Name Native, Last Name Native, Full Name NativeOriginal name values from the document in their native script.
    First Name, Last Name, Full NameLatin-script name values used for matching, screening, and integrations.

This dual representation lets you display localized names to users while using standardized Latin values for matching and regulatory screening.

Examples

InputOutputMapped fields
ู…ุญู…ุฏ ุนู„ูŠMohammed AliFull Name = Mohammed Ali, Full Name Native = ู…ุญู…ุฏ ุนู„ูŠ
ูุงุทู…ุฉ ุงู„ุฒู‡ุฑุงุกFatima Al ZahraaFull Name = Fatima Al Zahraa, Full Name Native = ูุงุทู…ุฉ ุงู„ุฒู‡ุฑุงุก
John SmithJohn SmithStored only in Latin fields; native fields remain empty.

Transliteration Models

IDWise supports two transliteration models:

Model typeDescriptionSupported languages
Dictionary-based modelUses linguistic rules and curated mappings for high precision on person names.Arabic (ar) only
Machine learning modelUses a machine learning model trained on multilingual name datasets for contextual transliteration.All supported non-Latin languages
๐Ÿ“˜

To enable the machine learning-based transliteration model, contact [email protected].

Supported Languages

CodeLanguageDescription
arArabicSupported with both dictionary-based and machine learning models.
ruRussianCyrillic script transliterated to Latin using ISO-9.
zh-cnChineseTransliterated using pinyin-style mappings.
jaJapaneseUses Hepburn-style transliteration for kana and kanji.
koKoreanUses standard Romanization of Hangul.
hiHindiDevanagari script transliterated to Latin.
bnBengaliScript converted to Latin using phonetic mapping.
urUrduArabic-based script transliterated using the machine learning model.
faPersian (Farsi)Arabic-based script handled using the machine learning model.
psPashtoArabic-based script transliterated phonetically.
kuKurdishArabic-based or Latin script, depending on source.
thThaiThai script transliterated to Latin.

Related Pages