Matching techniques in AML Screening
Use fuzzy matching and other techniques to identify potential risks, and reduce false positives and false negatives.
Within the AML Screening and Monitoring Solution, Sumsub utilizes various algorithms and methodologies that help enhance the precision and accuracy of the AML match analysis.
Once the matching results are retrieved from our data partner (Comply Advantage or World-Check-One), we apply internally developed resolution technologies that consider additional identifiers to confirm or deny the match.
Here’s a breakdown of the fuzzy matching and screening techniques used within our platform:
- Levenshtein distance technique (fuzzy matching). This algorithm compares string similarity (for example, input data against source data). It detects variations or minor deviations in the spelling of search terms, names, or entity names returned in the search results. Apart from catching spelling errors, fuzzy logic is also used to detect:
- Variations of names that have been transliterated from non-Latin scripts.
- Transcription variations: Husain/Hussein.
- Homophones (equivalent pronunciation): Jacqueline/Jacklyn.
- Phonetics: Irbah/Ibra.
- Hypocorisms (shortenings): James/Jimmy/Jim.
- Common abbreviations (for entities), for example, "Ltd" instead of "Limited.")
- Fuzziness. It represents the allowed degree of variation in spelling, often measured as a percentage. For example, a fuzziness setting of 30% might allow for one character difference in longer words, but not in shorter ones.
Note
Fuzziness is not performed on non-Latin characters.
- AI-powered contextual analysis. Rather than just relying on simple keyword-based searches, we employ a combination of machine learning models, natural language processing, and contextual analysis to classify adverse events and extract associated entities.
How Sumsub matching techniques work
Fuzzy matching and fuzziness
There are fuzzy matching criteria for different data attributes. You can configure these criteria as described in this article.
Applying the Exact Match configuration to your screening settings will disable all pre-processing, algorithmic levers, and custom configurations, such as equivalent names and phonetic matching, apart from word order and AKA matching, and will add a length filter (for example, "John Smith" will not match "John Williams Smith").
It will also disable the year-of-birth fuzziness, which allows a +/- 1-year difference in year of birth when fuzziness is set between 10% and 100%.
The fuzzy matching results are located in the Watchlists section of the applicant profile and are marked as a fuzzy match.
The Name column lists all potential applicant name matches found in various sanctions, PEP, watchlists, and adverse media sources.
If you disagree with the fuzzy matching results, you can change the status of the match to False positive in the corresponding column.
To learn more about managing AML cases, refer to this article.
Contextual analysis
AI models are trained on extensive ground truth data, annotated by domain experts, which allows them to recognize entire sentences and paragraphs that describe adverse events.
Key features include:
- Named Entity Recognition (NER). The models are trained to recognize entities even in complex sentence structures, improving the precision of entity identification.
- Relevance filtering. Machine learning models assess entire sentences and paragraphs to determine whether they describe adverse events.
- Sentiment analysis. NLP models identify adverse content based on contextual clues.
Managing false positives
A false positive is a screening result that the system considers a match, but in reality, it is not.
The Sumsub screening solution offers a comprehensive range of false-positive reduction capabilities designed to enhance screening accuracy while maintaining effective risk management. In addition to the functionality described above.
These capabilities include the following:
- Search profiles — allowing account administrators to tailor their screening parameters based on their specific requirements. For example, as a UK-based client, you may create a UK-focused search profile that screens against all Sanctions lists while limiting watchlist screening to only UK-based sources.
Note
Search profiles can only be set up by Sumsub. If this is of interest, make a request to your Customer Success Manager at [email protected]
- Matching algorithm and data inputs — balancing efficiency and effectiveness via system settings. By default, our platform is configured for the following:
- Name normalization — accounting for word order, case sensitivity, special characters, and common variations such as honorifics, suffixes, and name patterns. Company entity types with the phrase “Trading as” will only be considered with enhanced logic when in the middle of the input string between two company names.
- Exact and non-exact matching — supporting equivalent names, spelling variations, additional or missing words, initials, aliases, and company name variants.
- Search parameters — incorporating key identifiers such as country, entity type, date of birth, and deceased/delisted status.
- Language and script matching — supporting transliteration and accommodating naming conventions across multiple languages and writing systems.
- Entity resolution — consolidating related entities into unified profiles. For example, all references to Emmanuel Macron across PEP and Adverse Media data sources will be combined into a single profile.
- Whitelisting — enabling users to suppress alerts for specific entities that have been previously reviewed and confirmed as non-risk. This functionality is designed to be used alongside our False Positive match status, ensuring ongoing efficiency in alert management.
- Manual agent review — delegating potential matches to Sumsub's in-house operational moderation team, which reviews AML screening alerts on behalf of clients where the alert could not be automatically resolved (for example, due to a lack of information).
Note
Manual review is only available as a part of standard AML screening. It does not apply to the Basic AML Screening service.
- Enhanced matching algorithms — leveraging a wider range of data points than our data partners offer under the current integration, like TIN, identity document number, address, and parents' names, to further reduce noise and false positive rates.
Updated 5 days ago