...
Code Block | ||
---|---|---|
| ||
String s = "<scr" + "\uFEFF" + "ipt>"; s = Normalizer.normalize(s, Form.NFKC); Pattern pattern = Pattern.compile("<script>"); Matcher matcher = pattern.matcher(s); if(matcher.find()) { System.out.println("found black listed tag"); } else { // ... } s = s.replaceAll("[^\\p{ASCII}]", "\uFFFD"); // replaces all non-valid characters with unicode U+FFFD |
Wiki Markup |
---|
"{{U+FFFD}} is usually unproblematic, because it is designed expressly for this kind of purpose. That is, because it doesn't have syntactic meaning in programming languages or structured data, it will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available." \[[Unicode 08b|AA. Java References#Unicode 08b]\] |
Risk Assessment
Deleting non-character code points can allow malicious input to bypass validation checks.
...