...
Another domain where normalization is required before validation is in sanitizing untrusted pathnames path names in a filesystemfile system. This is addressed by guideline FIO04-J. Canonicalize path names before validating them.
...
This noncompliant code example attempts to validate the String
before performing normalization. Consequently, the validation logic fails to detect inputs that should be rejected, because the check for angle brackets fails to detect alternative unicode Unicode representations.
Code Block | ||
---|---|---|
| ||
// String s may be user controllable
// \uFE64 is normalized to < and \uFE65 is normalized to > using NFKC
String s = "\uFE64" + "script" + "\uFE65";
// Validate
Pattern pattern = Pattern.compile("[<>]"); // Check for angle brackets
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println("found black listed tag");
} else {
// ...
}
// Normalize
s = Normalizer.normalize(s, Form.NFKC);
|
The normalize
method transforms Unicode text into an equivalent composed or decomposed form, allowing for easier searching of text. The normalize method supports the standard normalization forms described in Unicode Standard Annex #15 â” Unicode Normalization Forms.
Compliant Solution
This compliant solution normalizes the string before validating it. Alternative representations of the string are normalized to the canonical angle brackets. Consequently, input validation correctly detects the malicious input and throws an IllegalStateException
.
Code Block | ||
---|---|---|
| ||
String s = "\uFE64" + "script" + "\uFE65";
// normalize
s = Normalizer.normalize(s, Form.NFKC);
//validate
Pattern pattern = Pattern.compile("[<>]");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println("found black listed tag");
throw new IllegalStateException();
} else {
// ...
}
|
...