Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: rewrite, got orphaned IDS00-J code

Many programs accept data from untrusted sources, and then pass the data (perhaps with minor modifications) to some subsystem. Often the subsystem accepts the data as a character string that follows some syntax, and both the program and subsystem must respect the syntax when managing the data.

Input sanitization refers to the elimination of unwanted characters from the input by means of removal, replacement, encoding or escaping the characters. Input must be sanitized, both because an application may be unprepared to handle the malformed input, and also because unsanitized input may conceal an attack vector.

Blacklisting

Blacklisting is the process of examining input data, looking for components that are known to be invalid. One advantage of this approach is that detection of known invalid input is often straightforward. A disadvantage is that the set of all possible invalid inputs may be unknown, or too large to enumerate fully. In such cases, consider the alternative of whitelisting known, valid inputs.

Depending on the language and subsystem in question, certain characters and character sequences are frequently considered to be invalid input when encountered in strings. A common set of such characters includes:

Character

Name

LF \r

Line Feed

CR \n

Carriage Return

CRLF \r\n

Line Feed + Carriage Return

" and '

Quotes

, and ;

Comma, semicolon, white space

/ and \

Forward and back slash

< and >

Angle brackets

&

Ampersand

%00

NULL

( and )

Parentheses

%

Percent

A blacklist of invalid inputs would forbid the appearance of any of these characters in their raw form. Note that determination of what constitutes invalid input can be difficult. For example, input validation of textual data using a black-listing approach requires enumerating not only the invalid characters shown above, but also the alternate Unicode representations of these characters in differing locales.

Whitelisting

The white-listing approach to input validation consists of building a list of valid input components (such as characters) and ensuring that untrusted input conforms to that list. Whitelisting is easier than blacklisting when it is easier to enumerate a set of valid input conditions than to detect and reject all instances of invalid input. But this advantage over blacklisting fails to apply when the set of valid inputs is difficult or impossible to enumerate.

Noncompliant Code Example (Blacklisting)

This noncompliant code example uses a user generated string xmlString, which will be parsed by an XML parser; see guideline IDS08-J. Prevent XML Injection. The description node is a String, as defined by the XML schema. Consequently, it accepts all valid characters including CDATA tags. must build a URI from untrusted input. It sanitizes the input by checking for angle brackets. However, the URI may consist of UTF-8 encoded character sequences. If the filter fails to forbid the % characters that comprise part of the UTF-8 encoding, it cannot achieve its purpose. For example, an attacker can bypass the filter by specifying the hexadecimal encoded form of the sequence <script> as %3C%73%63%72%69%70%74%3E.

Code Block
bgColor#FFcccc
xmlStringString tainted = "<item>\n%3C%73%63%72%69%70%74%3E"; +
// Hex encoded equivalent form of <script>

Pattern pattern     "<description><![CDATA[<]]>script<![CDATA[>]]>
             alert('XSS')<![CDATA[<]]>/script<![CDATA[>]]></description>\n" +
            "<price>500.0</price>\n" +
 	    "<quantity>1</quantity>\n" +
 	    "</item>";

This is insecure because an attacker may be able to inject an executable script into the XML representation, disguised using CDATA tags. CDATA tags, when processed, are removed by the XML parser, yielding the executable script. This can result in a Cross Site Scripting (XSS) vulnerability if the text in the nodes is displayed back to the user.

Similarly, if the XML tree is constructed at the server side from client inputs, comments of the form

Code Block
<!-- -->

may be maliciously inserted in an attempt to override the server side inputs. For instance, if the user can enter input into the description and quantity fields, it may be possible to override the price field set by the server. This can be achieved by entering the string "<!-- description" in the description field and the string "--></description> <price>100.0</price><quantity>1" in the quantity field (without the '"' characters in each case). The equivalent XML representation is:

Code Block

xmlString = "<item>\n"+
  	    "<description><!-- description</description>\n" +
 	    "<price>500.0</price>\n" +
 	    "<quantity>--></description> <price>100.0</price>
             <quantity>1</quantity>\n" +
 	    "</item>";

Note that the user can thus override the price field, changing it from 500.0 to an arbitrary value such as 100.0 (in this case).

Compliant Solution

This compliant solution creates a white list of possible string inputs. It allows only alphabetic characters in the description node, consequently eliminating the possibility of injection of < and > tags.

= Pattern.compile("[<>]");
if (pattern.matcher(tainted).find()) {
  throw new ValidationException("Invalid Input");
}
URI uri = new URI("http://vulnerable.com/" + tainted);

Noncompliant Code Example

This noncompliant code example attempts to check for the hex-encoded form in addition to the canonical representation of the angle brackets. Note, however, that the program remains vulnerable when an alternative encoding, such as a modified Base64 URL encoding, is used farther along the chain.

Code Block
bgColor#FFcccc

String tainted = Base64.encode("%3C%73%63%72%69%70%74%3E".getBytes()); // <script>

Pattern pattern = Pattern.compile("(%3C|<)(.*)(%3E|>)");
if (pattern.matcher(tainted).find()) {
  throw new ValidationException("Invalid Input");
}
URI uri = new URI("http://vulnerable.com/" + tainted);

This approach also fails to prevent other forms of injection attacks that do not rely on angle brackets. Further, the infeasibility of exhaustive enumeration of all forms of blacklisted characters renders the use of methods such as String.replaceAll() ineffective for sanitizing untrusted user input.

Compliant Solution (Whitelisting)

This compliant solution validates the input based on a white-list. It permits the URL to contain only alphanumeric characters and the encoded forms of the space (" ") and period (".") characters; all other characters are treated as invalid and are rejected.

Code Block
bgColor#ccccFF

String tainted = "%3C%73%63%72%69%70%74%3E"; // Hex encoded equivalent form of <script>

Pattern pattern = Pattern.compile("[\\W&&[^\\s\\.]]");
if (pattern.matcher(tainted).find()) {
  throw new ValidationException( "Invalid Input");
}
URI uri = new URI("http://vulnerable.com/" + tainted);
Code Block
bgColor#ccccff

if(!xmlString.matches("[\\w]*")) { // String does not match white-listed characters
  throw new IllegalArgumentException();
} 
// Use the xmlString            	

Risk Assessment

Failure to sanitize user input before processing or storing it can lead to injection of arbitrary executable content.

...