Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Many applications that accept untrusted input strings employ input filtering and validation mechanisms based on the strings' character data. For example, an application's strategy for avoiding cross-site scripting (XSS) vulnerabilities may include forbidding <script> tags in inputs. Such blacklisting mechanisms are a useful part of a security strategy, even though they are insufficient for complete input validation and sanitization.

Applications that accept untrusted input, especially Unicode-based input, should normalize the input before validating it.  Character information in Java is based on the Unicode Standard.   The following table shows the version of Unicode supported by the previous three release of Java SE.

Java VersionUnicode Version
Java SE 6Unicode Standard, version 4.0 [Unicode 2003]
Java SE 7Unicode Standard, version 6.0.0 [Unicode 2011]
Java SE 8Unicode Standard, version 6.2.0 [Unicode 2012]

Applications that accept untrusted input should normalize the input before validating it.  Normalization is important because in Unicode, the same string can have many different representations.  According to the Unicode Standard [Davis 2008], annex #15, Unicode Normalization Forms:

...