...
Consequently, any string modifications, including the removal or replacement of noncharacter code points, must be performed before any validation of the string is performed.
Noncompliant Code Example
The filterString()
method in this noncompliant code example normalizes the input string, validates that the input does not contain a <script>
tag, and then removes any non-ASCII characters from the input string. Because input validation is performed before the removal of non-ASCII characters, an attacker can insert noncharacter code points into the <script>
tag, bypass the validation checks.
Code Block | ||
---|---|---|
| ||
import java.text.Normalizer; import java.text.Normalizer.Form; import java.util.regex.Matcher; import java.util.regex.Pattern; public class TagFilter { public static String filterString(String str) { String s = Normalizer.normalize(str, Form.NFKC); // Validate input Pattern pattern = Pattern.compile("<script>"); Matcher matcher = pattern.matcher(s); if (matcher.find()) { throw new IllegalArgumentException("Invalid input"); } // Deletes all non-ASCII characters s = s.replaceAll("[^\\p{ASCII}]", ""); return s; } public static void main(String[] args) { // "\uFEFF" is a non-character code point String maliciousInput = "<scr" + "\uFEFF" + "ipt>"; String sb = filterStringBad(maliciousInput); // sb = "<script>" } } |
Compliant Solution
This compliant solution replaces the unknown or unrepresentable character with Unicode sequence \uFFFD
, which is reserved to denote this condition. It also performs this replacement before doing any other sanitization, in particular, checking for <script>
. This ensures that malicious input cannot bypass filters.
...
According to the Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b], "U+FFFD
is usually unproblematic, because it is designed expressly for this kind of purpose. That is, because it doesn't have syntactic meaning in programming languages or structured data, it will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available."
Risk Assessment
Validating input before eliminating noncharacter code points can allow malicious input to bypass validation checks.
Rule | Severity | Likelihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
IDS11-J | high | probable | medium | P12 | L1 |
Related Guidelines
Bibliography
[API 2006] |
|
3.5, Deletion of Noncharacters | |
Handling the Unexpected: Character-deletion | |
| |
|
...