Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Conformance clause C7 reads:

C7. When a process purports not to modify the interpretation of a valid coded character sequence, it shall make no change to that coded character sequence other than the possible replacement of character sequences by their canonical-equivalent sequences or the deletion of noncharacter code points.

Although the last phrase permits the deletion of noncharacter code points, for security reasons, they only should be removed with caution.

Whenever a character is invisibly deleted (instead of replaced), it may cause a security problem. The issue is the following: A gateway might be checking for a sensitive sequence of characters, say "delete". If what is passed in is "deXlete", where X is a noncharacter, the gateway lets it through: the sequence "deXlete" may be in and of itself harmless. But suppose that later on, past the gateway, an internal process invisibly deletes the X. In that case, the sensitive sequence of characters is formed, and can lead to a security breach.

Noncompliant Code Example

This noncompliant code example accepts only valid ASCII characters and deletes any non conforming characters. Input validation is being performed before this step. Consequently, a malicious <script> tag would bypass the filter, despite being black-listed.

Code Block
bgColor#FFcccc
String s = "<scr" + "\uFEFF" + "ipt>";
s = Normalizer.normalize(s, Form.NFKC);

// input validation
Pattern pattern = Pattern.compile("<script>");
Matcher matcher = pattern.matcher(s);
if(matcher.find()) {
  System.out.println("found black listed tag");
} else {
  // ... 
}

s = s.replaceAll("[^SDV11-J. Do not delete non-character code points^\\p{ASCII}]", ""); // deletes all non-valid characters		}

Compliant Solution

This compliant solution replaces the unknown or unrepresentable character with unicode sequence \uFFFD which is reserved to denote this condition. This ensures that malicious input cannot bypass filters.

Code Block
bgColor#ccccff
String s = "<scr" + "\uFEFF" + "ipt>";
s = Normalizer.normalize(s, Form.NFKC);
Pattern pattern = Pattern.compile("<script>");
Matcher matcher = pattern.matcher(s);
if(matcher.find()) {
  System.out.println("found black listed tag");
} else {
  // ... 
}
s = s.replaceAll("[^SDV11-J. Do not delete non-character code points^\\p{ASCII}]", "\uFFFD"); // replaces all non-valid characters with unicode U+FFFD

Wiki Markup
"{{U+FFFD}} is usually unproblematic, because it is designed expressly for this kind of purpose. That is, because it doesn't have syntactic meaning in programming languages or structured data, it will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available." \[[Unicode 08b|AA. Java References#Unicode 08b]\]

Risk Assessment

Deleting non-character code points can allow malicious input to bypass validation checks.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

MSC42-J

high

probable

medium

P18

L1

Automated Detection

TODO

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

References

Wiki Markup
\[[API 06|AA. Java References#API 06]\] 
\[[Weber 09|AA. Java References#Weber 09]\] Handling the Unexpected: Character-deletion
\[[Unicode 08b|AA. Java References#Unicode 08b]\] 3.5 Deletion of Noncharacters
\[[MITRE 09|AA. Java References#MITRE 09]\] [CWE ID|http://cwe.mitre.org/data/definitions/182.html] "Collapse of Data Into Unsafe Value"

...