Many programs accept data from untrusted sources, and then pass the data (perhaps with minor modifications) to some subsystem. Often the subsystem accepts the data as a character string that follows some syntax, and both the program and subsystem must respect the syntax when managing the data. (modified or unmodified) data across a trust boundary to a component in a separate trust domain.
Input sanitization refers to the elimination of unwanted characters from the input by means of removal, replacement, encoding or escaping the characters. Input must be sanitized, both because an application may be unprepared to handle the malformed input, and also because unsanitized input may conceal include an attack vector.
...
injection attack.
As a result, it is necessary to sanitize all string data passed to complex subsystems so that the resulting string is innocuous in the context in which it will be interpreted.
Noncompliant Code Example (Blacklisting)
Blacklisting is the process of examining input data, looking for components that are known to be invalid. One advantage of this approach is that detection of known invalid input is often straightforward. A disadvantage is that the set of all possible invalid inputs may be unknown, or too large to enumerate fully. In such cases, consider the alternative of whitelisting known, valid inputs.
Depending on the language and subsystem in question, certain characters and character sequences are frequently considered to be invalid input when encountered in strings. A common set of such characters includes:
...
A blacklist of invalid inputs would forbid the appearance of any of these characters in their raw form. Note that determination of what constitutes invalid input can be difficult. For example, input validation of textual data using a black-listing approach requires enumerating not only the invalid characters shown above, but also the alternate Unicode representations of these characters in differing locales.
Whitelisting
The white-listing approach to input validation consists of building a list of valid input components (such as characters) and ensuring that untrusted input conforms to that list. Whitelisting is easier than blacklisting when it is easier to enumerate a set of valid input conditions than to detect and reject all instances of invalid input. But this advantage over blacklisting fails to apply when the set of valid inputs is difficult or impossible to enumerate.
Noncompliant Code Example (Blacklisting)
This noncompliant example must build a URI from untrusted input. It sanitizes the input by checking for angle brackets. However, the URI may consist of UTF-8 encoded character sequences. If the filter fails to forbid the % characters that comprise part of the UTF-8 encoding, it cannot achieve its purpose. For example, an attacker can bypass the filter by specifying the hexadecimal encoded form of the sequence <script> as %3C%73%63%72%69%70%74%3E
.
Code Block | ||
---|---|---|
| ||
String tainted = "%3C%73%63%72%69%70%74%3E"; // Hex encoded equivalent form of <script> Pattern pattern = Pattern.compile("[<>]"); if (pattern.matcher(tainted).find()) { throw new ValidationException("Invalid Input"); } URI uri = new URI("http://vulnerable.com/" + tainted); |
Noncompliant Code Example
This noncompliant code example attempts to check for the hex-encoded form in addition to the canonical representation of the angle brackets. Note, however, that the program remains vulnerable when an alternative encoding, such as a modified Base64 URL encoding, is used farther along the chain.
...
This approach also fails to prevent other forms of injection attacks that do not rely on angle brackets. Further, the infeasibility of exhaustive enumeration of all forms of blacklisted characters renders the use of methods such as String.replaceAll()
ineffective for sanitizing untrusted user input.
Compliant Solution (Whitelisting)
The whitelisting approach to input validation consists of building a list of valid input elements (such as characters) and ensuring that untrusted input elements appear on that list. Whitelisting is easier than blacklisting when it is easier to enumerate valid input elements than to detect and reject all instances of invalid input elements. But this advantage over blacklisting fails to apply when the set of valid input elements is difficult or impossible to enumerate and creating a subset of valid input elements is not a viable solution.
This compliant solution validates the input based on a white-listwhitelist. It permits the URL to contain only alphanumeric characters and the encoded forms of the space (" ") and period (".") characters; all other characters are treated as invalid and are rejected.
Code Block | ||
---|---|---|
| ||
String tainted = "%3C%73%63%72%69%70%74%3E"; // Hex encoded equivalent form of <script> Pattern pattern = Pattern.compile("[\\W&&[IDS01-J. Sanitize untrusteddata inputpassed beforeacross processinga ortrust storing it^boundary^\\s\\.]]"); if (pattern.matcher(tainted).find()) { throw new ValidationException( "Invalid Input"); } URI uri = new URI("http://vulnerable.com/" + tainted); |
Risk Assessment
Failure to sanitize user input before processing or storing it can lead to injection of arbitrary executable contentattacks.
Guideline | Severity | Likelihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
IDS01-J | high | probable | medium | P12 | L1 |
Related Vulnerabilities
CVE-2008-2370 describes a vulnerability in Apache Tomcat 4.1.0 through 4.1.37, 5.5.0 through 5.5.26, and 6.0.0 through 6.0.16. When a RequestDispatcher
is used, Tomcat performs path normalization before removing the query string from the URI, which allows remote attackers to conduct directory traversal attacks and read arbitrary files via a .. (dot dot) in a request parameter.
Search for other vulnerabilities resulting from the violation of this guideline on the CERT website.
Bibliography
Wiki Markup |
---|
\[[OWASP 2008|AA. Bibliography#OWASP 08]\] [Testing for XML Injection (OWASP-DV-008)|http://www.owasp.org/index.php/Testing_for_XML_Injection_%28OWASP-DV-008%29] |
...