Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When implementations keep strings in a normalized form, they can be assured that equivalent strings have a unique binary representation.

Normalization Forms KC and KD must not be blindly applied to arbitrary text. Because they erase many formatting distinctions, they will prevent round-trip conversion to and from many legacy character sets, and unless supplanted by formatting markup, they may remove distinctions that are important to the semantics of the text. It is best to think of these Normalization Forms as being like uppercase or lowercase mappings: useful in certain contexts for identifying core meanings, but also performing modifications to the text that may not always be appropriate. They can be applied more freely to domains with restricted character sets...

Frequently, the most suitable normalization form for performing input validation on arbitrarily encoded strings is KC (NFKC) because normalizing to KC transforms the input into an equivalent canonical form that can be safely compared with the required input form.

...

Code Block
bgColor#FFcccc
// String s may be user controllable
// \uFE64 is normalized to < and \uFE65 is normalized to > using NFKC
String s = "\uFE64" + "script" + "\uFE65";

// Validate
Pattern pattern = Pattern.compile("[<>]"); // Check for angle brackets
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
  // Found black listed tag
  throw new IllegalStateException();
} else {
  // ...
}

// Normalize
s = Normalizer.normalize(s, Form.NFKC);

The {{normalize () }} method transforms Unicode text into an equivalent composed or decomposed form, allowing for easier searching of text. The normalize method supports the standard normalization forms described in Unicode Standard Annex #15 Unicode Normalization Forms.

...

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="6fa1be93b56e9785-b08304e7-4ed14bd9-a7aca1a1-b69d190b68bcb7bad47b0676"><ac:plain-text-body><![CDATA[

[ISO/IEC TR 24772:2010

http://www.aitcnet.org/isai/]

" Cross-site Scripting [XYT] "

]]></ac:plain-text-body></ac:structured-macro>

MITRE CWE

CWE-289, ". Authentication Bypass by Alternate Name "

 

CWE-180, ". Incorrect Behavior Order: Validate Before Canonicalize "

Bibliography

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="11ccd9d6b0d500e5-626484c1-49eb4bae-870b8348-0a9bcaa44e53caa9c5a04809"><ac:plain-text-body><![CDATA[

[[API 2006

AA. Bibliography#API 06]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="13a3ccd6b81ff0c5-c03519d6-47be48dc-94b7afca-105cb9faf28314cc90b7d9c7"><ac:plain-text-body><![CDATA[

[[Davis 2008

AA. Bibliography#Davis 08]]

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="f09d00bcb0fd5eb2-a9800782-46e14e1c-9ada8e7f-a47061213af4b01757303bf6"><ac:plain-text-body><![CDATA[

[[Weber 2009

AA. Bibliography#Weber 09]]

]]></ac:plain-text-body></ac:structured-macro>

...