Retour à The LedgerConformité

Managing false positives in AML screening: a Swiss-aligned disposition framework

Auditors do not measure how few alerts you generated. They measure whether each disposition is consistent, justified, and reproducible. Here is the disposition framework that survives a Swiss AML audit.

Antoine Bedaton
Antoine Bedaton
02 avr. 202611 min de lecture
Managing false positives in AML screening: a Swiss-aligned disposition framework

Part of our complete guide to negative news screening for Swiss banks. This post is the deep dive on the disposition framework for false positives; the guide covers the end-to-end picture.

A compliance officer in a mid-sized Swiss private bank told me their team was closing roughly 95% of screening alerts as false positives. They asked whether this was a problem.

It is not, in itself. False-positive rates of 90% to 99% are typical for sanctions and PEP screening at Swiss FIs, especially for institutions running broad lists with conservative thresholds. The rate is a function of how the system was tuned, not a sign that the team is doing the work badly. What FINMA and external auditors examine is something else entirely: not how many alerts you closed, but whether each closure can be justified by reading a single record, years after the analyst who made the decision has left the firm.

This post is about the disposition framework that produces records auditors can read. It draws on the Wolfsberg Group's published guidance on sanctions screening, AMLO-FINMA's reconstruction requirement, and the disposition model we ended up implementing in NNSFlow.

Why dispositions are what auditors look at

AMLO-FINMA Article 22 requires that documents and supporting evidence be kept so that "individual transactions can be reconstructed". The ten-year retention period sits in AMLA Article 7. Together they create a specific obligation: the decision to close an alert must be readable, by a third party, in year ten. The Wolfsberg Group's Statement on Effective Monitoring for Suspicious Activity and the more recent Statement on Sanctions Screening make the same point in different language: the value of screening lies in the quality of the disposition process, not in the alert volume.

What this means in practice: an audit that finds inconsistency between two analysts disposing of structurally identical alerts is more damaging than one that finds a high overall false-positive rate. Inconsistency suggests the framework is not real. A high rate, by itself, just suggests conservative tuning.

The recent FINMA enforcement record bears this out. The published matters of the last several years (see our reconstruction-drill protocol for the details) turned on dispositions that could not, on re-examination, defend the original judgement. The artefacts existed. The reasoning around them did not survive scrutiny.

The five-code disposition vocabulary

A standard disposition vocabulary is the cheapest single intervention that improves audit defensibility. Five codes cover the great majority of screening outcomes:

1. Closed: no match. The hit was a system artefact rather than a genuine candidate match. Common cases: a typo'd transliteration scoring above threshold against an unrelated list entry, a generic surname like "Abdullah" matching every Abdullah on a sanctions list, or a corporate suffix collision ("AG", "Ltd", "Holdings"). The reviewing analyst concluded that no real-world person or entity is in question. Document the disambiguator that ruled out the match (different country, different date of birth range, different industry).

2. Closed: false match. The hit was on a plausible candidate, but contextual evidence rules out identity. Example: the screened individual shares a full name with a sanctioned person, but the date of birth differs by 30 years, or the nationality is documented and inconsistent, or the occupation does not match. This is the most common disposition in real screening operations and is also where most quality issues hide. The standard requires a named disambiguator, not "looks like a different person".

3. Closed: true match. The hit was on the actual sanctioned person or entity. The transaction or relationship is now subject to the relevant sanctions regime. The disposition triggers the rest of the compliance workflow: blocking, reporting under the Embargo Act (EmbA), SECO notification, MROS suspicious-activity reporting under AMLA Article 9. "True match" is a terminal screening disposition; the case is no longer about screening.

4. Escalated. The reviewer cannot rule out a true match with the information available, and the case requires a more senior reviewer or additional context (a country-risk specialist, a sanctions team, the MLRO). Escalation is not a disposition in itself; it is a state. The escalated case eventually closes as one of the other four codes.

5. Pending. The case is open, awaiting customer information, third-party data, or a response from a foreign correspondent. Pending alerts have maximum ages defined per institution; an alert that has been "pending" for six months is effectively closed without disposition, which is the worst possible audit outcome.

The codes are deliberately narrow. Anything that does not fit one of them is a signal that the framework needs a sixth code, or that the analyst is using "Other" as a catch-all (which is itself a finding).

What the NNSFlow data model captures

In the platform, terminal investigation decisions are recorded as one of CLEARED, FLAGGED, ESCALATED, or PENDING_REVIEW. Entity risk status flows from these via an append-only entity_decisions table that preserves prior decisions, decision notes, the deciding user's identity and email at the time, the level-of-detail (LoD) tier of the review, and a JSON risk_factors payload for structured disambiguators. The append-only constraint means a disposition can be superseded by a later decision but never edited or deleted, which is what makes the table reconstructable years out.

The mapping from the five-code industry vocabulary to the four-state NNSFlow enum is straightforward: closed-no-match and closed-false-match both land on CLEARED (with the disambiguator captured in the structured risk-factors field), closed-true-match maps to FLAGGED, escalated maps to ESCALATED, and pending maps to PENDING_REVIEW. The reason the platform uses a smaller terminal-state set is that the substantive audit question, "did this entity end up cleared or not", is binary; the distinction between the two "closed" reasons belongs in the structured field that captures why, not in the state itself.

The audit trail that survives a five-year reconstruction

A disposition is auditable when, given the disposition record alone, a reviewer can answer four questions without picking up the phone:

  1. Who decided. A named individual, resolved at the moment of the decision (employee ID and name as of the decision date, not as of today). The four-eyes principle adds the second name; the disposition framework adds the first.
  2. When. A timestamp with timezone, for the disposition itself and for the alert it disposed of. The lag between alert generation and disposition is itself a metric auditors examine.
  3. On what evidence. The specific documents, list entries, customer records, and third-party data points consulted. Captured by hash (see the SHA-256 evidence-chain post), not by URL or DOM reference.
  4. Under what configuration. The version of the screening list, the match algorithm and threshold settings, and the customer-data fields present at the time. A disposition made at a 75% fuzzy-match threshold means something different at 90%; the configuration has to be recoverable.

The fourth point is the one most institutions get wrong. Lists change. Thresholds get retuned. Customer data is enriched and corrected over the years. A disposition record that does not pin its configuration is unreconstructable, because the same alert run today against today's configuration would not produce the same hit.

The NNSFlow audit log records each disposition as an immutable row with the actor, timestamp, decision, reason code, and a hash of the frozen investigation snapshot. The configuration tied to that snapshot is preserved alongside it. None of this is novel; it is the minimum bar that emerges once you actually try to walk an auditor through a two-year-old case.

Where false positives come from

Disposition is the downstream problem. The upstream problem is the alert itself. False positives originate from one of five sources, each of which has a different fix.

Name similarity at the string level. "Mohammed Hassan" produces hundreds of hits against any global PEP list. Without secondary identifiers, every name-only hit on a common name is essentially noise. This is a list-design problem, not an algorithm problem.

Transliteration. Arabic, Cyrillic, Mandarin, and Thai names lose information when transliterated to Latin script, and the loss is not deterministic across sources. "Mohamed", "Mohammad", "Muhammad", "Mohamad", and "Mohammed" are the same name. Algorithms that score these as separate names produce false negatives; algorithms that treat them as identical produce noise on every Mohammad on the list. The ICAO 9303 machine-readable-zone transliteration standard is the closest thing to common ground.

List quality. A sanctions or PEP list with stale entries, inconsistent date formats, missing date-of-birth fields, or duplicated records produces avoidable hits. The OpenSanctions vs World-Check comparison covers this in more depth; the relevant point here is that the provider's curation directly determines the institution's false-positive rate, and audit findings sometimes attribute the noise back to vendor-selection diligence under AMLO-FINMA Article 25.

Lack of contextual disambiguators. A name match without a date of birth, a country of residence, or an industry code is unreviewable. The screening engine had nothing to filter on. Acceptance of accounts without these data fields, or onboarding processes that record them inconsistently, push false-positive rates up at the source. This is a KYC question masquerading as a screening question.

Threshold tuning. Fuzzy-match thresholds set conservatively (say, 65% similarity) generate more hits than thresholds set aggressively (90%). Tightening the threshold reduces noise but at the cost of a higher miss rate. The trade-off is real, and the answer is almost never "tune the threshold". The answer is usually "add a disambiguator the threshold can rely on".

Reducing false positives at source

Threshold tuning is the first lever every team reaches for. It is the weakest. A more durable approach works the data layer first.

Curate the lists. Run inventory of the sources feeding screening, remove sources that overlap or duplicate, prefer providers that publish their data quality and refresh schedule. Track per-source false-positive rates as a vendor-management metric.

Capture secondary identifiers at onboarding. Date of birth, country of birth, country of residence, and primary occupation are the four fields that, in our experience, eliminate the bulk of name-collision noise. Enforcing them at onboarding is harder than tuning thresholds and produces more lasting reduction.

Use contextual filters. A 75% name match with a matching DOB year is qualitatively different from a 75% name match with a 30-year DOB delta. The match score should not be a single number; it should be a weighted combination of name, date, country, and (where applicable) identification document. Most screening engines support this; not all institutions configure it.

Apply transliteration normalisation consistently. ICAO 9303 or an equivalent canonicalisation step, applied at write-time to both the customer record and the list, removes most transliteration variance before scoring. Doing it at read-time, per-query, is slower and more fragile.

Watch for list noise specific to entities. Corporate name collisions ("Acme Trading", "ABC Holdings") are a different problem from individual-name collisions and benefit from different filters (jurisdiction of incorporation, registry number, industry).

The four-eyes role in disposition decisions

A disposition is not necessarily a four-eyes decision. Most closed-no-match and closed-false-match dispositions, in routine operations, are made by a single analyst because the disambiguator is clear (different DOB, different country, different industry). The four-eyes principle attaches to a narrower class of cases.

In NNSFlow, the rule that maps to the regulatory expectation is: single-analyst disposition is sufficient for CLEARED outcomes at LoD1 and LoD2 levels of detail; four-eyes review is required for any FLAGGED or ESCALATED decision, and for CLEARED decisions at LoD3 (enhanced due diligence) where the disposition closes a relationship that had been previously flagged. The institution can configure the threshold; the platform enforces whichever rule the institution has set.

The escalation queue itself is a four-eyes structure: the escalating analyst submits, a senior reviewer (typically the MLRO or a sanctions specialist) makes the final call, and the two cannot be the same person. The eligibility predicate runs server-side at the moment of review, against the directory and the org graph as they existed at the time the review is happening.

What this looks like to auditors: the cases that go through four-eyes review are exactly the cases where the institution was making a substantive judgement (not closing noise). The audit trail distinguishes "rejected as noise by one analyst" from "judged non-sanctioned by two", and the regulator expects to see the latter on every FLAGGED and ESCALATED outcome.

Bottom line

False-positive rates are a tuning artefact. Disposition quality is a control. Auditors examine the second, not the first.

A workable framework has three components: a small, well-defined set of disposition codes; a structured record per disposition that captures who, when, on what evidence, under what configuration; and upstream data work to reduce alert volume at the source rather than through threshold compression. The reconstruction obligation in AMLO-FINMA Article 22, and the broader Wolfsberg Statement on Sanctions Screening expectations, are not satisfied by any of these in isolation. They are satisfied by all three together.

If you would like to compare your current disposition codes against the framework above, or run an audit drill against your own historical alerts, get in touch. The exercise is useful regardless of which screening platform produced the alerts in the first place.

#false-positives#disposition#alerts#FINMA#swiss