Many old programs assumed Legacy software frequently assumes that every character in a string occupied 8 bits, that is, eg a Java byte
. The Java language assumed that every character in a string occupies 16 bytes,that is, eg a Java char
. Unfortunately, the Java byte
was not sufficient to hold all possible characters, and neither is a Java char
. Many strings are stored on disk and in memory using an encoding such as UTF-8
that allows characters to have varying sizes.
Consequently, while While Java strings are stored as arrays of charstype char
, and can be represented as an array of bytes, a single character in the string might be represented by two or more consecutive bytes or chars. Splitting a char or byte array runs the risk of splitting two chars or bytes that make up a multibyte character. Security vulnerabilities may arise when an application expects input in a form that an attacker is capable of bypassing. This can happen when an application disregards supplementary characters, multibyte characters, or when it fails to use combining characters appropriately. Combining characters are characters that modify other characters. Refer to the Combining Diacritical Marks chart for more details on combining characters.
...
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="636ff2266bfe3c7f-c5c8cdcf-43f045fa-84d59d73-0f38c091aee66a28ebfcc4f9"><ac:plain-text-body><![CDATA[ | [[API 2006 | AA. Bibliography#API 06]] | Classes | ]]></ac:plain-text-body></ac:structured-macro> |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="41dffef8081f7d09-4f13c754-45ff4a96-b719be74-f175386e49fc9c66884e8841"><ac:plain-text-body><![CDATA[ | [[Hornig 2007 | AA. Bibliography#Hornig 07]] | Problem areas: Characters | ]]></ac:plain-text-body></ac:structured-macro> |
...