Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: fixed typozz

Many old programs assumed that every character in a string occupied 8 bits, eg a Java byte. The Java language assumed that every character in a string occupies 16 bitesbytes, eg a Java char. Unfortunately, the Java byte was not sufficient to hold all possible characters, and neither is a Java char. Many strings are stored on disk and in memory using an encoding such as UTF-8 that allows characters to have varying sizes.

...

The trailing byte ranges overlap the range of both the single byte and lead byte characters. When a multibyte character is separated across a buffer boundary, it can be interpreted differently than if it if were not separated across the buffer boundary; this difference arises due to the ambiguity of its composing bytes [Phillips 2005].

...

Supplementary Characters

Wiki Markup
According to the Java API \[[API 2006|AA. Bibliography#API 06]\], class {{Character}} documentation (Unicode Character Representations)

...

The size of the data byte buffer depends on the maximum number of bytes required to write an encoded character. For example, UTF-8 encoded data requires four bytes to represend represent any character above U+FFFF. Because Java uses the UTF-16 character encoding to represent char data, such sequences are split into two separate char values of two bytes each. Consequently, the buffer size should be four times the size of a typical byte sequence.

...