Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
bgColor#ccccff
String s = "<scr" + "\uFEFF" + "ipt>";
s = Normalizer.normalize(s, Form.NFKC);
Pattern pattern = Pattern.compile("<script>");
Matcher matcher = pattern.matcher(s);
if(matcher.find()) {
  System.out.println("found black listed tag");
} else {
  // ... 
}
s = s.replaceAll("[^\\p{ASCII}]", "\uFFFD"); // replaces all non-valid characters with unicode U+FFFD

Wiki Markup
"{{U+FFFD}} is usually unproblematic, because it is designed expressly for this kind of purpose. That is, because it doesn't have syntactic meaning in programming languages or structured data, it will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available." \[[Unicode 08b|AA. Java References#Unicode 08b]\]

Risk Assessment

Deleting non-character code points can allow malicious input to bypass validation checks.

...