Page History

...

Code Block

bgColor	#ccccff

String s = "<scr" + "\uFEFF" + "ipt>";
s = Normalizer.normalize(s, Form.NFKC);
Pattern pattern = Pattern.compile("<script>");
Matcher matcher = pattern.matcher(s);
if(matcher.find()) {
  System.out.println("found black listed tag");
} else {
  // ... 
}
s = s.replaceAll("[^\\p{ASCII}]", "\uFFFD"); // replaces all non-valid characters with unicode U+FFFD

Wiki Markup

"{{U+FFFD}} is usually unproblematic, because it is designed expressly for this kind of purpose. That is, because it doesn't have syntactic meaning in programming languages or structured data, it will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available." \[[Unicode 08b|AA. Java References#Unicode 08b]\]

Risk Assessment

Deleting non-character code points can allow malicious input to bypass validation checks.

...

Space shortcuts

Page tree

Versions Compared

Old Version 1

New Version 2

Key

Risk Assessment