Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Many applications that accept untrusted input strings employ input filtering and validation mechanisms based on the strings' character data. For example, an application's strategy for avoiding cross-site scripting (XSS) vulnerabilities may include forbidding <script> tags in inputs. Such blacklisting mechanisms are a useful part of a security strategy, even though they are insufficient for complete input validation and sanitization.

Applications that accept untrusted input, especially Unicode-based input, should normalize the input before validating it.  Character information in Java is based on the Unicode Standard. The following table shows the version of Unicode supported by the previous latest three release releases of Java SE.

Java VersionUnicode Version
Java SE 6Unicode Standard, version 4.0 [Unicode 2003]
Java SE 7Unicode Standard, version 6.0.0 [Unicode 2011]
Java SE 8Unicode Standard, version 6.2.0 [Unicode 2012]

Applications that accept untrusted input should normalize the input before validating it.  Normalization is important because in Unicode, the same string can have many different representations.  According to the Unicode Standard [Davis 2008], annex #15, Unicode Normalization Forms:

When implementations keep strings in a normalized form, they can be assured that equivalent strings have a unique binary representation.

...

The Normalizer.normalize() method transforms Unicode text into an equivalent composed or decomposed form, allowing for easier searching of text. This method supports the standard normalization forms described in Unicode Standard Annex #15 Unicode Normalization FormsFrequently, the most suitable normalization form for performing input validation on arbitrarily encoded strings is KC (NFKC) because normalizing to KC transforms the input into an equivalent canonical form that can be safely compared with the required input form .

This noncompliant code example attempts to validate the String before performing normalization.

...

The validation logic fails to detect the <script> tag because it is not normalized at the time. Therefore the code fails to reject the system accepts the invalid input.

Compliant Solution

This compliant solution normalizes the string before validating it. Alternative representations of the string are normalized to the canonical angle brackets. Consequently, input validation correctly detects the malicious input and throws an IllegalStateException.

...

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

IDS01-J

High

Probable

Medium

P12

L1

Automated Detection

ToolVersionCheckerDescription
The Checker Framework

Include Page
The Checker Framework_V
The Checker Framework_V

Tainting CheckerTrust and security errors (see Chapter 8)
Fortify1.0

Process_Control

Implemented

Related Guidelines

ISO/IEC TR 24772:2013

Cross-site Scripting [XYT]

MITRE CWE

CWE-289, Authentication bypass by alternate name
CWE-180, Incorrect behavior order: Validate before canonicalize

...

Android apps can receive string data from the outside and normalize it.

Bibliography

 

...

IDS00-J. Sanitize untrusted data passed across a trust boundaryImage Added      Image Modified