Many applications that accept untrusted input strings employ input filtering and validation mechanisms based on the strings' character data.
that black-list characters. For example, an application's strategy for avoiding Cross Site Scripting (XSS) vulnerabilities may include forbidding <script>
tags in inputs. Such black-listing mechanisms are a useful part of a security strategy, even though they are insufficient for complete input validation and sanitization. When implemented, this form of validation must be performed only after normalizing the input.
...
The most suitable normalization form for performing input validation on arbitrarily-encoded strings is KC (NFKC), because normalizing to KC transforms the input into an equivalent canonical form that can be safely compared with the required input form.
Another domain where normalization is required before validation is in sanitizing untrusted pathnames in a filesystem. This is addressed by guideline FIO04-J. Canonicalize path names before validating them.
Noncompliant Code Example
...