...
Similar to UTF-8 and other variable-width encodings, programmers must be careful when reading UTF-16 data as a series of bytes to not form strings containing partial Unicode code points (that is, a high surrogate value without a corresponding low surrogate). Because the UTF-16 representation is also used in char
arrays and in the String
and StringBuffer
classes, care must also be taken when manipulating string data in Java. This typically means using methods that accept a Unicode code point as an int
value and avoiding methods that accept a Unicode code unit as a char
value as these latter methods cannot support supplementary characters.
...