Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: removed redundant instruction not to split characters

...

A user character may be composed of more than one Unicode character. For example, the user character ü can be composed by combining the Unicode characters \u0075 (u) and \u00a8 (¨). ... The character ü may also be represented by the single Unicode character \u00fc.

Do not split a string between two combining characters.

Multibyte Characters

Multibyte encodings are used for character sets that require more than one byte to uniquely identify each constituent character. For example, the Japanese encoding Shift-JIS (shown below) supports multibyte encoding where the maximum character length is two bytes (one leading and one trailing byte).

...