Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • command processor via a call to system() or similar function
  • relational databases
  • third-party COTS components (e.g., an enterprise resource planning subsystem)

Wiki Markup
Data sanitization requires an understanding of the data being passed and the capabilities of the subsystem.  John Viega and Matt Messier provide an example of an application

 that inputs an email address into a buffer and then uses this string as an argument in a call to {{system()}} \[Viega 03\]:

Code Block
sprintf(buffer, "/bin/mail %s < /tmp/email", addr);
system(buffer);

...

It is necessary to ensure that all valid data is accepted while potentially dangerous data is rejected or sanitized. This can be difficult when valid characters or sequences of characters also have special meaning to the subsystem and may involve validating the data against a grammergrammar. In cases where there is no overlap, white listing can be used to eliminate dangerous characters from the data.

The white listing approach to data sanitization is to define a list of acceptable characters and remove any character that is not acceptable. The list of valid input values is typically a predictable, well-defined set of manageable size. The following This example, based on the tcp_wrappers package written by Wietse Venema, illustrates the white listing approach:.

Code Block
static char ok_chars[] = "abcdefghijklmnopqrstuvwxyz\
                           ABCDEFGHIJKLMNOPQRSTUVWXYZ\
                           1234567890_-.@";
char user_data[] = "Bad char 1:} Bad char 2:{";
char *cp; /* cursor into string */
for (cp = user_data; \*(cp \+= strspn(cp, ok_chars)); )
  *cp = '_';

The benefit of white listing is that a programmer can be certain that a string contains only characters that are considered safe by the programmer. White listing is recommended over black listing because , which traps all unacceptable characters, as the programmer only needs to ensure that acceptable characters are identified. As a result, the programmer can be less concerned about which characters an attacker may try in an attempt to bypass security checks.

...