Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Many applications employ input filtering and validation mechanisms that black-list characters. For example, an application may wish to forbid <script> tags to avoid vulnerabilities such as 's strategy for avoiding Cross Site Scripting (XSS) . Although such vulnerabilities may include forbidding <script> tags in inputs. Such black-listing mechanisms are insufficient by themselves, when implemented, a useful part of a security strategy, even though they are insufficient for complete input validation and sanitization. When implemented, this form of validation must be performed only after normalizing the input.

...

When implementations keep strings in a normalized form, they can be assured that equivalent strings have a unique binary representation.

Normalization Forms KC and KD must not be blindly applied to arbitrary text. Because they erase many formatting distinctions, they will prevent round-trip conversion to and from many legacy character sets, and unless supplanted by formatting markup, they may remove distinctions that are important to the semantics of the text. It is best to think of these Normalization Forms as being like uppercase or lowercase mappings: useful in certain contexts for identifying core meanings, but also performing modifications to the text that may not always be appropriate. They can be applied more freely to domains with restricted character sets ...

The most suitable normalization form for performing input validation is KC (NFKC) is the most suitable for performing input validation because the input is transformed , because normalizing to KC transforms the input into an equivalent canonical form that can be safely compared with the required input form.

Noncompliant Code Example

This noncompliant code example validates attempts to validate the String before performing the normalization. Consequently, an attacker can get past the validation logic fails to detect inputs that should be rejected, because the check for angle brackets being checked for have fails to detect alternative unicode representations that need to be normalized before any validation can be performed.

Code Block
bgColor#FFcccc
// String s may be user controllable
// \uFE64 is normalized to < and \uFE65 is normalized to > using NFKC
String s = "\uFE64" + "script" + "\uFE65"; 

// Validate
Pattern pattern = Pattern.compile("[<>]"); // Check for angle brackets
Matcher matcher = pattern.matcher(s);
if(matcher.find()) {
  System.out.println("found black listed tag");
} else {
  // ... 
}

// Normalize
s = Normalizer.normalize(s, Form.NFKC); 

...

This compliant solution normalizes the string before validating it. Alternative representations of the string are normalized to the canonical angle brackets. Consequently, input validation succeeds correctly detects the malicious input and throws an IllegalStateException results.

Code Block
bgColor#ccccff
String s = "\uFE64" + "script" + "\uFE65";

// normalize
s = Normalizer.normalize(s, Form.NFKC); 

//validate
Pattern pattern = Pattern.compile("[<>]"); 
Matcher matcher = pattern.matcher(s);
if(matcher.find()) {
  System.out.println("found black listed tag"); 
  throw new IllegalStateException();
} else {
  // ... 
}

...

Validating input before normalization can allow affords attackers the opportunity to bypass filters and other security mechanisms. This can result in the execution of arbitrary code.

Guideline

Severity

Likelihood

Remediation Cost

Priority

Level

IDS02-J

high

probable

medium

P12

L1

Automated Detection

TODO

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this guideline on the CERT website.

...

Wiki Markup
\[[API 2006|AA. Bibliography#API 06]\] 
\[[Davis 2008|AA. Bibliography#Davis 08]\]
\[[Weber 2009|AA. Bibliography#Weber 09]\]
\[[MITRE 2009|AA. Bibliography#MITRE 09]\] [CWE ID 289|http://cwe.mitre.org/data/definitions/289.html] "Authentication Bypass by Alternate Name" and [CWE ID 180|http://cwe.mitre.org/data/definitions/289.html] "Incorrect Behavior Order: Validate Before Canonicalize"
\[[Weber 2009|AA. Bibliography#Weber 09]\]

...

IDS01-J. Sanitize before processing or storing user input      13. Input Validation and Data Sanitization (IDS)      IDS03-J. Do not delete non-character code points