...
Input validation is being performed before the deletion of non-ASCII characters. Consequently, an attacker can disguise a <script>
tag and bypass the validation checks.
Code Block | ||
---|---|---|
| ||
// "\uFEFF" is a non-character code point String s = "<scr" + "\uFEFF" + "ipt>"; s = Normalizer.normalize(s, Form.NFKC); // Input validation Pattern pattern = Pattern.compile("<script>"); Matcher matcher = pattern.matcher(s); if (matcher.find()) { System.out.println("Found black listed tag"); } else { // ... } // Deletes all non-valid characters s = s.replaceAll("[^\\p{ASCII}]", ""); // s now contains "<script>" |
...
This compliant solution replaces the unknown or unrepresentable character with Unicode sequence \uFFFD
, which is reserved to denote this condition. It also does this replacement before doing any other sanitization, in particular, checking for <script>
. This ensures that malicious input cannot bypass filters.
Code Block | ||
---|---|---|
| ||
String s = "<scr" + "\uFEFF" + "ipt>"; s = Normalizer.normalize(s, Form.NFKC); // Replaces all non-valid characters with unicode U+FFFD s = s.replaceAll("[^\\p{ASCII}]", "\uFFFD"); Pattern pattern = Pattern.compile("<script>"); Matcher matcher = pattern.matcher(s); if (matcher.find()) { System.out.println("Found blacklisted tag"); } else { // ... } |
...
[API 2006] |
|
3.5, Deletion of Noncharacters | |
Handling the Unexpected: Character-deletion | |
| |
|
IDS10-J. Do not split characters between two data structures IDS12-J. Perform lossless conversion of String data between differing character encodings