Page History

Many applications that accept untrusted input strings employ input filtering and validation mechanisms based on the strings' character data. For example, an application's strategy for avoiding cross-site scripting (XSS) vulnerabilities may include forbidding <script> tags in inputs. Such blacklisting mechanisms are a useful part of a security strategy, even though they are insufficient for complete input validation and sanitization. When implemented, this form of validation must be performed only after normalizing the input.unmigrated-wiki-markup

Character information in Java 1.6 is based on the Unicode Standard, version 4.0 \[[Unicode 2003|AA. Bibliography#Unicode 2003]\]. Character information in Java 1.6 is based on the Unicode Standard, version on the Unicode Standard. The following table shows the version of Unicode supported by the latest three releases of Java SE.

Java Version	Unicode Version
Java SE 6	Unicode Standard, version 4.0 [Unicode 2003]
Java SE 7	Unicode Standard, version 6.0.0

...

[

...

Unicode 2011]
Java SE 8	Unicode Standard, version 6.2.0 [Unicode 2012]

Applications that accept untrusted input should normalize the input before validating it. Normalization is important because in Unicode, the same string can have many different representations. According to the Unicode Standard [Davis 2008], annex #15, Unicode Normalization Forms: 2011|AA. Bibliography#Unicode 2011]\]. Wiki MarkupAccording to the Unicode Standard \[[Davis 2008|AA. Bibliography#Davis 08]\], annex #15, Unicode Normalization Forms,

When implementations keep strings in a normalized form, they can be assured that equivalent strings have a unique binary representation.
Normalization Forms KC and KD must not be blindly applied to arbitrary text. Because they erase many formatting distinctions, they will prevent round-trip conversion to and from many legacy character sets, and unless supplanted by formatting markup, they may remove distinctions that are important to the semantics of the text. It is best to think of these Normalization Forms as being like uppercase or lowercase mappings: useful in certain contexts for identifying core meanings, but also performing modifications to the text that may not always be appropriate. They can be applied more freely to domains with restricted character sets ...

Noncompliant Code Example

The Normalizer.normalize() method transforms Unicode text into the standard normalization forms described in Unicode Standard Annex #15 Unicode Normalization Forms. Frequently, the most suitable normalization form for performing input validation on arbitrarily encoded strings is KC (NFKC) because normalizing to KC transforms the input into an equivalent canonical form that can be safely compared with the required input form .

Noncompliant Code Example

This noncompliant code example attempts to validate the the String before before performing normalization. Consequently, the validation logic fails to detect inputs that should be rejected because the check for angle brackets fails to detect alternative Unicode representations.

Code Block

bgColor	#FFcccc


// String s may be user controllable
// \uFE64 is normalized to < and \uFE65 is normalized to > using the NFKC normalization form
String s = "\uFE64" + "script" + "\uFE65";

// Validate
Pattern pattern = Pattern.compile("[<>]"); // Check for angle brackets
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
  // Found black listed tag
  throw new IllegalStateException();
} else {
  // ...
}

// Normalize
s = Normalizer.normalize(s, Form.NFKC);

The

...

validation logic fails to detect the <script> tag because it is not normalized at the time. Therefore the system accepts the invalid input.

Compliant Solution

This compliant solution normalizes the string before validating it. Alternative representations of the string are normalized to the canonical angle brackets. Consequently, input validation correctly detects the malicious input and throws an IllegalStateException.

Code Block

bgColor	#ccccff


String s = "\uFE64" + "script" + "\uFE65";

// normalizeNormalize
s = Normalizer.normalize(s, Form.NFKC);

//validate Validate
Pattern pattern = Pattern.compile("[<>]");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
  // Found black listedblacklisted tag
  throw new IllegalStateException();
} else {
  // ...
}

...

Validating input before normalization affords attackers the opportunity to bypass filters and other security mechanisms. This It can result in the execution of arbitrary code.

Rule	Severity	Likelihood	Remediation Cost	Priority	Level
IDS01-J	high High	probable Probable	medium Medium	P12	L1

Automated Detection

Tool

Version

Checker

Description

The Checker Framework

Include Page

	The Checker Framework_V

Related Guidelines

	The Checker Framework_V

Tainting Checker

Trust and security errors (see Chapter 8)

Fortify

1.0

Process_Control

Implemented

Related Guidelines

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="37d47caa-e62e-4a3e-ab31-3652dcebda77"><ac:plain-text-body><![CDATA[	[ISO/IEC TR 24772:2010	http://www.aitcnet.org/isai/]	2013	"Cross-site Scripting [XYT]" ]]></ac:plain-text-body></ac:structured-macro>
MITRE CWE	CWE-289, " Authentication Bypass by Alternate Name" bypass by alternate name CWE-180, " Incorrect Behavior Order: Validate Before Canonicalize"

Bibliography

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="775daf41-90b2-40df-9a08-d2b07ebd3c02"><ac:plain-text-body><![CDATA[	[[API 2006	AA. Bibliography#API 06]]	]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="2e2a0a78-9eb7-4fc9-91db-88a61e19d485"><ac:plain-text-body><![CDATA[	[[Davis 2008	AA. Bibliography#Davis 08]]	]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="99c868a0-8aa0-409e-b082-f3f241367779"><ac:plain-text-body><![CDATA[	[[Weber 2009	AA. Bibliography#Weber 09]]	]]></ac:plain-text-body></ac:structured-macro>

behavior order: Validate before canonicalize

Android Implementation Details

Android apps can receive string data from the outside and normalize it.

Bibliography

[API 2006]	Java Platform, Standard Edition 6 API Specification
[Davis 2008]
[Weber 2009]	Exploiting Unicode-enabled Software

...

Image Added Image Added Image Added

IDS00-J. Sanitize untrusted data passed across a trust boundary Image Removed Image Removed

Space shortcuts

Page tree

Versions Compared

Old Version 68

New Version Current

Key

Noncompliant Code Example

Compliant Solution

Automated Detection

Related Guidelines

Related Guidelines

Bibliography

Android Implementation Details

Bibliography