...
Ignoring the possibility of supplementary characters, multibyte characters, or combining characters (characters that modify other characters) may allow an attacker to bypass input validation checks. Consequently, characters must not be split between two data structures.
Combining Characters
A combining character sequence is a base character followed by any number of combining characters. The combining character sequence forms a grapheme, which is a minimally distinctive unit of writing in the context of a particular writing system. For example, the grapheme ü
can be composed by combining the base character \u0075
(u
) with the combining diacritical mark \u00a8
(¨
). It may also be represented by the single Unicode character \u00fc
.
...