Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There are many different categories of characters that a programmer might decide to exclude.  For example, The Unicode Standard [Unicode 2012defines the following categories of characters all of which can be matched using an appropriate regular expression:

...

In some versions of The Unicode Standard prior to version 5.2, conformance clause C7 allows the deletion of noncharacter code points. For example, conformance clause C7 from Unicode 5.1 states [Unicode 2007]:

C7. When a process purports not to modify the interpretation of a valid coded character sequence, it shall make no change to that coded character sequence other than the possible replacement of character sequences by their canonical-equivalent sequences or the deletion of noncharacter code points.

According to the Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b], Section 3.5, "Deletion of Code Points":

...

According to the Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b], "U+FFFD is usually unproblematic, because it is designed expressly for this kind of purpose. That is, because it doesn't have syntactic meaning in programming languages or structured data, it will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available."

...

MITRE CWE

CWE-182. Collapse of data into unsafe value

Bibliography

[API 2006]

 

[Davis 2008b]

3.5, Deletion of Noncharacters

[Weber 2009]

Handling the Unexpected: Character-deletion

[Unicode 2007]

 

[Unicode 2011]

 

 

...

      Rule 00: Input Validation and Data Sanitization (IDS)Image Added