Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Input sanitization refers to the elimination of unwanted characters from the input by means of removal, replacement, encoding or escaping the characters. Input must be sanitized, both because an application may be unprepared to handle the malformed input, and also because unsanitized input may conceal an attack vector.

Blacklisting

Blacklisting is the process of examining input data, looking for components that are known to be invalid. One advantage of this approach is that detection of known invalid input is often straightforward. A disadvantage is that the set of all possible invalid inputs may be unknown, or too large to enumerate fully. In such cases, consider the alternative of whitelisting known, valid inputs.

...

A blacklist of invalid inputs would forbid the appearance of any of these characters in their raw form. Note that determination of what constitutes invalid input can be difficult. For example, input validation of textual data using a black-listing approach requires enumerating not only the invalid characters shown above, but also the alternate Unicode representations of these characters in differing locales.

Whitelisting

The white-listing approach to input validation consists of building a list of valid input components (such as characters) and ensuring that untrusted input conforms to that list. Whitelisting is easier than blacklisting when it is easier to enumerate a set of valid input conditions than to detect and reject all instances of invalid input. But this advantage over blacklisting fails to apply when the set of valid inputs is difficult or impossible to enumerate.

Noncompliant Code Example (Blacklisting)

This noncompliant example must build a URI from untrusted input. It sanitizes the input by checking for angle brackets. However, the URI may consist of UTF-8 encoded character sequences. If the filter fails to forbid the % characters that comprise part of the UTF-8 encoding, it cannot achieve its purpose. For example, an attacker can bypass the filter by specifying the hexadecimal encoded form of the sequence <script> as %3C%73%63%72%69%70%74%3E.

Code Block
bgColor#FFcccc
String tainted = "%3C%73%63%72%69%70%74%3E"; // Hex encoded equivalent form of <script>

Pattern pattern = Pattern.compile("[<>]");
if (pattern.matcher(tainted).find()) {
  throw new ValidationException("Invalid Input");
}
URI uri = new URI("http://vulnerable.com/" + tainted);

Noncompliant Code Example

This noncompliant code example attempts to check for the hex-encoded form in addition to the canonical representation of the angle brackets. Note, however, that the program remains vulnerable when an alternative encoding, such as a modified Base64 URL encoding, is used farther along the chain.

...

This approach also fails to prevent other forms of injection attacks that do not rely on angle brackets. Further, the infeasibility of exhaustive enumeration of all forms of blacklisted characters renders the use of methods such as String.replaceAll() ineffective for sanitizing untrusted user input.

Compliant Solution (Whitelisting)

This compliant solution validates the input based on a white-list. It permits the URL to contain only alphanumeric characters and the encoded forms of the space (" ") and period (".") characters; all other characters are treated as invalid and are rejected.

Code Block
bgColor#ccccFF
String tainted = "%3C%73%63%72%69%70%74%3E"; // Hex encoded equivalent form of <script>

Pattern pattern = Pattern.compile("[\\W&&[^IDS01-J. Sanitize untrusted input before processing or storing it^\\s\\.]]");
if (pattern.matcher(tainted).find()) {
  throw new ValidationException( "Invalid Input");
}
URI uri = new URI("http://vulnerable.com/" + tainted);

Risk Assessment

Failure to sanitize user input before processing or storing it can lead to injection of arbitrary executable content.

Guideline

Severity

Likelihood

Remediation Cost

Priority

Level

IDS01-J

high

probable

medium

P12

L1

Related Vulnerabilities

CVE-2008-2370 describes a vulnerability in Apache Tomcat 4.1.0 through 4.1.37, 5.5.0 through 5.5.26, and 6.0.0 through 6.0.16. When a RequestDispatcher is used, Tomcat performs path normalization before removing the query string from the URI, which allows remote attackers to conduct directory traversal attacks and read arbitrary files via a .. (dot dot) in a request parameter.

Search for other vulnerabilities resulting from the violation of this guideline on the CERT website.

Bibliography

Wiki Markup
\[[OWASP 2008|AA. Bibliography#OWASP 08]\] [Testing for XML Injection (OWASP-DV-008)|http://www.owasp.org/index.php/Testing_for_XML_Injection_%28OWASP-DV-008%29]

...