Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Reverted from v. 41

Input sanitization refers to the elimination of unwanted characters from the input by means of removal, replacement, encoding or escaping the characters. Input must be sanitized, both because an application may be unprepared to handle the malformed input, and also because unsanitized input may conceal an attack vector.

Noncompliant Code Example

This noncompliant code example uses a user generated string xmlString, which will be parsed by an XML parser; see guideline IDS08-J. Prevent XML Injection. The description node is a String, as defined by the XML schema. Consequently, it accepts all valid characters including CDATA tags.

Code Block
bgColor#FFcccc

xmlString = "<item>\n" +
            "<description><![CDATA[<]]>script<![CDATA[>]]>
             alert('XSS')<![CDATA[<]]>/script<![CDATA[>]]></description>\n" +
            "<price>500.0</price>\n" +
 	    "<quantity>1</quantity>\n" +
 	    "</item>";

This is insecure because an attacker may be able to inject an executable script into the XML representation, disguised using CDATA tags. CDATA tags, when processed, are removed by the XML parser, yielding the executable script. This can result in a Cross Site Scripting (XSS) vulnerability if the text in the nodes is displayed back to the user.

Similarly, if the XML tree is constructed at the server side from client inputs, comments of the form

Code Block
<!-- -->

may be maliciously inserted in an attempt to override the server side inputs. For instance, if the user can enter input into the description and quantity fields, it may be possible to override the price field set by the server. This can be achieved by entering the string "<!-- description" in the description field and the string "--></description> <price>100.0</price><quantity>1" in the quantity field (without the '"' characters in each case). The equivalent XML representation is:

Code Block

xmlString = "<item>\n"+
  	    "<description><!-- description</description>\n" +
 	    "<price>500.0</price>\n" +
 	    "<quantity>--></description> <price>100.0</price>
             <quantity>1</quantity>\n" +
 	    "</item>";

Note that the user can thus override the price field, changing it from 500.0 to an arbitrary value such as 100.0 (in this case).

Compliant Solution

This compliant solution creates a white list of possible string inputs. It allows only alphabetic characters in the description node, consequently eliminating the possibility of injection of < and > tags.

Code Block
bgColor#ccccff

if(!xmlString.matches("[\\w]*")) { // String does not match white-listed characters
  throw new IllegalArgumentException();
} 
// Use the xmlString            	

Risk Assessment

Failure to sanitize user input before processing or storing it can lead to injection of arbitrary executable content

Every program operates in several security domains. Any system can be divided into several subsystem component, where each component has a specific security domain. For instance, one component may have access to the filesystem, but not the network, while another component can access the network but not the filesystem. When two components share data, if they have different degrees of trust, the data is said to flow across a trust boundary.

Since Java allows untrusted code to run alongside trusted code, it is perfectly possible for a Java program to maintain different security domains, and to have internal trust boundaries. Every Java program contains both locally-written code (that is, local to the developers) as well as code whose implementation is beyond the developers' control. Similarly, even within code whose implementation is controlled by the developers, there may be a distinction between trusted and untrusted code. Alternatively, the locally developed portions of a program may include both application-specific code and also locally-developed library code that is shared with other programs (either local or external). We define a trust boundary to be the points at which control or data pass from one aggregation of locally-developed code to another aggregation of code, without regard to whether the second aggregation is locally or externally developed.

The trust boundaries of any system are mandated by the system administrator. While the system components can provide support for trust boundaries, they cannot dictate what trust is given to any component. Consequently, the deployer of a system must define the trust boundaries as part or the system's security policy. Any security auditor can then take that definition and ensure that the software adequately supports the policy. The definition of security domains — and consequently of the boundaries between those domains — is necessarily application-specific.

We recommend that any library that may be exported outside its home organization should consider itself to be in its own unique security domain (in the sense that its client code cannot be known when the library is written). Consequently the library should consider its API to be a trust boundary. Similarly, we recommend that libraries acquired from outside organizations should be considered to be separate security domains from in-house code. Circumstances will differ; the important thing is to identify and respect the necessary security domains in your application.

Any program that maintains a trust boundary must deal with data coming in over that trust boundary...that is, from a process with a differing security domain. Likewise, any program with must also deal with data going out over a trust boundary. It is imperative that data that crosses a trust boundary undergo filtering. We shall examine the two cases in turn:

Data Output

Data that is output by a program may be sent to some component of the system with a different security domain. In this case, the program must ensure that the data is suitable to the remote component's trust boundary. The system must do this by eamining the data and filtering out any sensitive information.

Image Removed

Like trust boundaries, the question of what information is /sensitive/ is resolved by the system's security policy. Therefore, a program cannot define which information is sensitive, it can only provide support for handling information that may potentially be declared sensitive by the system administrator.

For instance, if malicious code manages to infiltrate a system, many attacks will be futile if the system's output is appropriately escaped and encoded. Refer to the guideline IDS04-J. Properly encode or escape output for more details.

Java programs have many opportunities to output sensitive information. Several rules address the mitigation of sensitive information disclosure, including EXC06-J. Do not allow exceptions to expose sensitive information and FIO08-J. Do not log sensitive information.

Data Input

Data that is received by a program from a source outside the program's trust boundary may, in fact, be malicious. The program must therefore take steps to ensure the data is valid and not malicious.

Image Removed

These steps can include the following:

/Validation/, in the broadest sense, is the process of ensuring that input data falls within the expected range of valid program input. For example, method arguments must conform not only to the type and numeric range requirements of a method or subsystem, but also contain data that conforms to the required input invariants for that method.

/Sanitization/: In many cases, the data may be fed directly to some subsystem. Data sanitization is the process of ensuring that data conforms to the requirements of the subsystem to which it are passed. Sanitization also involves ensuring that data also conforms to any security-related requirements regarding leaking or exposure of sensitive data when it is output across a trust boundary. Refer to the related guideline IDS01-J. Carefully filter any data that passes through a trust boundary for more details on data sanitization. Data sanitization and input validation may coexist and complement each other.

/Canonicalization/ and /Normalization/: Canonicalization is the process of lossless reduction of the input to its equivalent simplest known form. Normalization is the process of lossy conversion of the data to its simplest known (and anticipated) form. Canonicalization and normalization must occur before validation to prevent attackers from exploiting the validation routine to strip away illegal characters and thus construct a forbidden character sequence. Refer to the guideline IDS02-J. Validate strings after performing normalization for more details. In addition, ensure that normalization is performed only on fully assembled user input. Never normalize partial input or combine normalized input with non-normalized input.

For instance, POSIX filesystems provide a syntax for expressing file names on the system using paths. A path is a string which indicates how to find any file by starting at a particular directory (usually the current working directory), and traversing down directories until the file is found. A path is canonical if it contains no symbolic links, and no special entries, such as '.' or '..'; as these are handled specially on POSIX systems. Every file accessible from a directory has exactly one canonical path, along with many non-canonical paths.

Many rules address proper filtering of untrusted input, especially when such input is passed to a complex subsystem. For example, see IDS08-J. Prevent XML Injection.

Risk Assessment

Failure to properly filter data that crosses a trust boundary can cause information leakage and execution of arbitrary code.

Guideline

Severity

Likelihood

Remediation Cost

Priority

Level

IDS00 IDS01-J

high

probable

medium

P12

L1

Related Vulnerabilities

CVE-2008-2370Search for vulnerabilities resulting from the violation of this guideline on the CERT website.

Bibliography

Wiki Markup
\[[OWASP 20052008|AA. Bibliography#OWASP 0508]\] 
\[[OWASP 2007|AA. Bibliography#OWASP 07]\][Testing for XML Injection (OWASP-DV-008)|http://www.owasp.org/index.php/Testing_for_XML_Injection_%28OWASP-DV-008%29]

...

IDS00-J. Introduction to 13. Input Validation and Data Sanitization (IDS)      13. Input Validation and Data Sanitization (IDS)      IDS01IDS02-J. Carefully filter any data that passes through a trust boundaryValidate strings after performing normalization