As a result, it is necessary to sanitize all string data passed to parsers or command interpreters so that the resulting string is innocuous in the context in which it will be parsed or interpreted.

Sanitization Techniques

Blacklisting

Blacklisting is the process of examining input data, looking for components that are known to be invalid. One advantage of this approach is that detection of known invalid input is often straightforward. A disadvantage is that the set of all possible invalid inputs may be unknown, or too large to enumerate fully.

Depending on the language and subsystem in question, certain characters and character sequences are frequently considered to be invalid input when encountered in strings. A common set of such characters includes:

Character	Name
LF \r	Line Feed
CR \n	Carriage Return
CRLF \r\n	Line Feed + Carriage Return
" and '	Quotes
, and ;	Comma, semicolon, white space
/ and \	Forward and back slash
< and >	Angle brackets
&	Ampersand
%00	NULL
( and )	Parentheses
%	Percent

A blacklist of invalid inputs would forbid the appearance of any of these characters in their raw form. Note that determination of what constitutes invalid input can be difficult. For example, input validation of textual data using a black-listing approach requires enumerating not only the invalid characters shown above, but also the alternate Unicode representations of these characters in differing locales.

Whitelisting

The whitelisting approach to input validation consists of building a list of valid input elements (such as characters) and ensuring that all untrusted input elements appear on that list. Whitelisting is easier than blacklisting when it is easier to enumerate valid input elements than to detect and reject all instances of invalid input elements. But this advantage over blacklisting fails to apply when the set of valid input elements is difficult or impossible to enumerate and creating a subset of valid input elements is not a viable solution.

Component-based Sanitization

Many parsers and command interpreters provide their own sanitization and validation APIs. When available, their use is preferred over homegrown sanitization techniques, as homegrown sanitization can often neglect special cases or hidden complexities in the parser.

...

Space shortcuts

Page tree

Versions Compared

Old Version 57

New Version 58

Key

Sanitization Techniques

Blacklisting

Whitelisting

Component-based Sanitization

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 57

New Version 58

Key

Sanitization Techniques

Blacklisting

Whitelisting

Component-based Sanitization