Input sanitization refers to the elimination of unwanted characters from the input by means of removal, replacement, encoding or escaping the characters. It is critical to sanitize input because an application may not be prepared to handle the malformed input or the unsanitized input may conceal an attack vector.
Noncompliant Code Example
This noncompliant code example uses a user generated string xmlString
. The string is designed to be parsed by an XML parser (see MSC34-J. Prevent XML Injection). The description
node is a String
, as defined by the XML schema. Consequently, it accepts all valid characters including CDATA
tags. This is dangerous because an attacker may be able to inject a script into the XML representation as CDATA
tags, when processed, are removed by the XML parser. This vulnerability is called Cross Site Scripting (XSS).
xmlString = "<item>\n" + "<description><![CDATA[<]]>script<![CDATA[>]]> alert('XSS')<![CDATA[<]]>/script<![CDATA[>]]></description>\n" + "<price>500.0</price>\n" + "<quantity>1</quantity>\n" + "</item>";
Compliant Solution
TODO