...
A programmer might decide to exclude many different categories of characters. For example, The Unicode Standard [Unicode 2012] defines the following categories of characters, all of which can be matched using an appropriate regular expression:
Abbr | Long | Description |
---|---|---|
Cc | Control | A C0 or C1 control code |
Cf | Format | A format control character |
Cs | Surrogate | A surrogate code point |
Co | Private_Use | A private-use character |
Cn | Unassigned | A reserved unassigned code point or a noncharacter |
Other programs may remove or replace any character belonging to a uniquely defined set of characters. Any string modifications must be performed before the string is validated.
...
Code Block | ||
---|---|---|
| ||
import java.text.Normalizer; import java.text.Normalizer.Form; import java.util.regex.Matcher; import java.util.regex.Pattern; public class TagFilter { public static String filterString(String str) { String s = Normalizer.normalize(str, Form.NFKC); // Validate input Pattern pattern = Pattern.compile("<script>"); Matcher matcher = pattern.matcher(s); if (matcher.find()) { throw new IllegalArgumentException("Invalid input"); } // Deletes noncharacter code points s = s.replaceAll("[\\p{Cn}]", ""); return s; } public static void main(String[] args) { // "\uFDEF" is a noncharacter code point String maliciousInput = "<scr" + "\uFDEF" + "ipt>"; String sb = filterStringBadfilterString(maliciousInput); // sb = "<script>" } } |
...
Validating input before removing or modifying characters in the input string can allow malicious input to bypass validation checks.
Rule | Severity | Likelihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
IDS11-J | High | Probable | Medium | P12 | L1 |
Automated Detection
Tool | Version | Checker | Description | ||||||
---|---|---|---|---|---|---|---|---|---|
The Checker Framework |
| Tainting Checker | Trust and security errors (see Chapter 8) | ||||||
Parasoft Jtest |
| CERT.IDS11.VPPD | Validate all dangerous data |
Related Guidelines
Bibliography
[API 2006] |
Section 3.5, "Deletion of Noncharacters" | |
[Seacord 2015] | |
"Handling the Unexpected: Character-deletion |
...
" (slides 72–74) |
...