IDS12-J. Sanitize non-character code points before performing other sanitization

In some versions prior to Unicode 5.2, conformance clause C7 allowed the deletion of noncharacter code points. For example, conformance clause C7 from Unicode 5.1 states: [[Unicode 2007]]

C7. When a process purports not to modify the interpretation of a valid coded character sequence, it shall make no change to that coded character sequence other than the possible replacement of character sequences by their canonical-equivalent sequences or the deletion of noncharacter code points.

According to the Unicode Technical Report #36, Unicode Security Considerations [[Davis 2008b]], Section 3.5, "Deletion of Noncharacters"

Whenever a character is invisibly deleted (instead of replaced), such as in this older version of C7, it may cause a security problem. The issue is the following: A gateway might be checking for a sensitive sequence of characters, say "delete". If what is passed in is "deXlete", where X is a noncharacter, the gateway lets it through: the sequence "deXlete" may be in and of itself harmless. However, suppose that later on, past the gateway, an internal process invisibly deletes the X. In that case, the sensitive sequence of characters is formed, and can lead to a security breach.

Because character-level modifications of a string can nullify substring-level checks, it is important to perform the character-level modifications before substring-level checks.

Noncompliant Code Example

This noncompliant code example accepts only valid ASCII characters and deletes any non conforming characters. It also checks for the existence of a <script> tag.

Input validation is being performed before the character checks. As such, this code also violates IDS01-J. Normalize strings before validating them. Consequently, an attacker can disguise a <script> tag and fool the filter.

String s = "<scr" + "\uFEFF" + "ipt>"; // "\uFEFF" is a non-character code point
s = Normalizer.normalize(s, Form.NFKC);

// Input validation
Pattern pattern = Pattern.compile("<script>");
Matcher matcher = pattern.matcher(s);
if(matcher.find()) {
  System.out.println("Found black listed tag");
} else {
  // ... 
}

s = s.replaceAll("^\\p{ASCII}]", ""); // Deletes all non-valid characters
// s now contains "<script>"

Compliant Solution

This compliant solution replaces the unknown or unrepresentable character with unicode sequence \uFFFD which is reserved to denote this condition. It also does this replacement before doing any other sanitization, in particular, checking for <script>. This ensures that malicious input cannot bypass filters.

Unknown macro: {mc}

Strange things are happening with the regex below. Our bot inserts a link to the same rec within the code regex.

String s = "<scr" + "\uFEFF" + "ipt>";

s = Normalizer.normalize(s, Form.NFKC);
s = s.replaceAll("^\\p{ASCII}]", "\uFFFD"); // Replaces all non-valid characters with unicode U+FFFD

Pattern pattern = Pattern.compile("<script>");
Matcher matcher = pattern.matcher(s);
if(matcher.find()) {
  System.out.println("Found black listed tag");
} else {
  // ... 
}

"U+FFFD is usually unproblematic, because it is designed expressly for this kind of purpose. That is, because it doesn't have syntactic meaning in programming languages or structured data, it will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available" [[Davis 2008b]].

Risk Assessment

Deleting non-character code points can allow malicious input to bypass validation checks.

Rule	Severity	Likelihood	Remediation Cost	Priority	Level
IDS03-J	high	probable	medium	P12	L1

Related Guidelines

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

[[MITRE 2009]]

CWE ID 182 "Collapse of Data Into Unsafe Value"

Bibliography

[[API 2006]]
[[Davis 2008b]]	3.5 Deletion of Noncharacters
[[Weber 2009]]	Handling the Unexpected: Character-deletion
[[Unicode 2007]]	The Unicode Consortium. The Unicode Standard, Version 5.1.0, defined by: The Unicode Standard, Version 5.0 (Boston, MA, Addison-Wesley, 2007. ISBN 0-321-48091-0), as amended by Unicode 5.1.0 ( http://www.unicode.org/versions/Unicode5.1.0/ ).
[[Unicode 2011]]	The Unicode Consortium. The Unicode Standard, Version 6.0.0, (Mountain View, CA: The Unicode Consortium, 2011. ISBN 978-1-936213-01-6) http://www.unicode.org/versions/Unicode6.0.0/

null null [!The CERT Oracle Secure Coding Standard for Java^button_arrow_right.png!]

Space shortcuts

Page tree

Noncompliant Code Example

Compliant Solution

Risk Assessment

Related Guidelines

Bibliography