Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: updated examples

...

Noncompliant Code Example

This The filterString() method in this noncompliant code example accepts only valid ASCII characters and deletes normalizes the input string, validates that the input does not contain <script> tag, and then removes any non-ASCII characters . It also checks for the existence of a <script> tag.Input validation is being from the input string.  Because input validation is performed before the deletion removal of non-ASCII characters. Consequently, an attacker can disguise a <script> tag and bypass insert noncharacter code points into the <script> tag, bypass the validation checks.

Code Block
bgColor#FFcccc
// "\uFEFF" is a non-character code point
String s = "<scr" + "\uFEFF" + "ipt>"; 
import java.text.Normalizer;
import java.text.Normalizer.Form;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class TagFilter {
  public static String filterString(String str) {
    String s = Normalizer.normalize(sstr, Form.NFKC);
    // InputValidate validationinput
    Pattern pattern = Pattern.compile("<script>");
    Matcher matcher = pattern.matcher(s);
    if (matcher.find()) {
  System.out.println    throw new IllegalArgumentException("Found black listed tagInvalid input");
}   else {}
  // ... 
}

// Deletes all non-ASCII characters
 
   s = s.replaceAll("[^\\p{ASCII}]", "");
    return s;
  }
 
  public static void main(String[] args) {
    // s now contains "\uFEFF" is a non-character code point
    String maliciousInput = "<scr" + "\uFEFF" + "ipt>";
    String sb = filterStringBad(maliciousInput);
    // sb = "<script>"
  }
}

Compliant Solution

This compliant solution replaces the unknown or unrepresentable character with Unicode sequence \uFFFD, which is reserved to denote this condition. It also performs this replacement before doing any other sanitization, in particular, checking for <script>. This ensures that malicious input cannot bypass filters.

Code Block
bgColor#ccccff
import java.text.Normalizer;
import java.text.Normalizer.Form;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class TagFilter {
 
  public static String filterString(String str) {
 s = "<scr" +// "\uFEFF" + "ipt>";

 is a non-character code point
    String s = Normalizer.normalize(sstr, Form.NFKC);
    // Replaces all non-valid characters with unicodeUnicode U+FFFD
    s = s.replaceAll("[^\\p{ASCII}]", "\uFFFD");
 

   // Validate input
    Pattern pattern = Pattern.compile("<script>");
    Matcher matcher = pattern.matcher(s);
    if (matcher.find()) {
       System.out.printlnthrow new IllegalArgumentException("FoundInvalid blacklisted taginput");
    }
    return s;
  } else
  public static void main(String[] args) {
  // ... 
}
  // "\uFEFF" is a non-character code point
    String maliciousInput = "<scr" + "\uFEFF" + "ipt>";
    String s = filterString(maliciousInput);
    // s = <scr?ipt>
  }

According to the Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b], "U+FFFD is usually unproblematic, because it is designed expressly for this kind of purpose. That is, because it doesn't have syntactic meaning in programming languages or structured data, it will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available."

...