Name transliteration
Get to know Sumsub’s name transliteration process.
Transliteration is the process of converting text from one script to another. In Sumsub, it is used during applicant verification to represent names written in non-Latin scripts (such as Japanese, Hebrew, Arabic, and so on), as well as letters and symbols absent from the Latin alphabet, using Latin characters.
For example:
ß
(German) becomesss
ü
(German) becomesue
ö
(Icelandic/German) becomesoe
þ
(Icelandic) becomesth
- and other local characters.
Sumsub also uses transliteration for cross-checks and duplicate detection when comparing data, which helps normalize names before comparison. When documents are in different languages, transliteration bridges the gap and enables accurate name matching.
Transliteration is applied to any document uploaded by your applicants as Proof of Identity (PoI) or Proof of Address (PoA), such as passports, ID cards, driver’s licenses, residence permits, bank statements, and other document types. You can find transliterated names in the Applicant page in the Dashboard.
How name transliteration works
The transliteration algorithm in Sumsub is a multifaceted process that involves not only the transliteration itself, but also data validation, method selection for further processing, and result refinement to ensure accuracy before final use in the system.
Step 1: Input collection
The system starts by collecting the following data from documents:
- First name
- Middle name (optional)
- Last name
- Country code (to determine applicable transliteration logic)
- Default transliteration values from the document (if available)
Step 2: Context validation
Before processing, the system checks:
- Whether the name field is empty.
- Whether reliable transliterations already exist (if the name is already in Latin script, no transliteration is performed).
- Whether the country requires special handling, such as:
- Official transliteration standards — some countries have their own transliteration requirements.
- Name order rules — in some countries, the family name comes first.
Step 3: Full name combination
If all name parts are valid for processing, the full name is formed by combining the first, middle, and last names. The combined full name is then used as input for transliteration. This improves accuracy by considering the full name context.
Step 4: Transliteration or Translation decision
In this step, the system determines which of the following methods should be used for the name:
- Default transliteration — the system uses the default transliterated names provided in the document. This method is preferred for certain combinations of countries and document types where these values are considered trusted and can be used directly. In some cases, MRZ(Machine Readable Zone) data may also be used within this method.
- Transliteration — the system actively transliterates the name provided in the document. This method is preferred when the script has reliable character mappings (for example, Cyrillic to Latin). Сustom transliteration rules are applied based on the document’s country.
- Translation — the system translates the name instead of transliterating it. This method is used when transliteration is unreliable or produces inconsistent results, typically for languages such as Arabic or Hebrew. In such cases, translation provides a more accurate and stable representation of the name.
Note
Translation is used instead of transliteration for the following languages:
- Hebrew (he)
- Farsi (fa)
- Arabic (ar)
- Japanese (ja)
- Chinese (zh)
- Korean (South/North) (ko)
- Sinhala (si)
- Thai (th)
- Lao (lo)
- Burmese (my)
- Amharic (am)
- Greek (el)
Decision criteria
- Country and Language determination — the system first identifies the country and language associated with the document. The language is determined using the document’s country code (if available) or by analyzing the character script.
- Default transliteration check — the system checks whether the default transliteration values can be trusted and are sufficient. For certain country and document type combinations, these values are used as provided. If trusted, the system skips further transliteration or translation.
- Translation switch — if the default transliteration method is unreliable for the detected language, the system switches to the translation method. For some languages, standard transliteration yields poor or unstable results, and in such cases, translation is preferred.
Step 5: Post-processing
After transliteration or translation, the system cleans and normalizes the result. Additionally, it structures the full transliterated name into components to ensure consistent downstream use. As the final step, the system:
- Compares default transliteration values (if available) with the computed result.
- Selects the most reliable version for each name component.
- Structures the transliterated result for integration across system components.
Name transliteration results validation
To ensure high-quality transliteration, Sumsub applies a combination of validation, refinement, and automation techniques.
Validation is performed considering the following criteria:
- Number of name parts (for example, 3 parts in→3 parts out, meaning the system processes and returns the same number of name components).
- Comparison with default transliterations.
- Detection of suspicious characters, such as digits, symbols, or other irregular characters.
- Review of short or non-Latin results after transliteration.
- Additional rules refined through experience.
Updated 2 days ago