Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: rewriting in progress....

String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8.  Errors may occur when converting between differently coded character data.  For example,MalformedInputException - There are two general types of encoding errors. If the byte sequence starting at the input buffer's current position is not legal valid for this the specified charset and then the current malformed- input action is CodingErrorActionconsidered malformed. REPORTUnmappableCharacterException - If the byte sequence starting at the input buffer's current position cannot be mapped to an equivalent character sequence and the current then an unmappable -character action is CodingErrorAction.REPORTcharacter has been encountered.

According to the Java API  [API 2014] for the String class constructors:

The behavior of this constructor when the given bytes are not valid in the given charset is unspecified.

According to the Java API [API 2014], Similarly, the description of the String.getBytes(Charset) method documentationstates:

This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array.

...

Noncompliant Code Example

This noncompliant code example corrupts the data when string contains characters that are not representable in the specified charsetexample is similar to the one used in STR03-J. Do not represent numeric data as strings in that it attempts to convert a byte array containing the two's-complement representation of this BigInteger value to a String. Because the byte array contains malformed-input sequences, the behavior of the String constructor is unspecified.

Code Block
bgColor#FFcccc
// Corrupts data on errors
import java.math.BigInteger;
import java.nio.CharBuffer;

public class CharsetConversion {
  public static byte[]void toCodePagemain(String[] charset, String string)args) {
  throws UnsupportedEncodingException {
BigInteger x return string.getBytes(charset);
}
 
// Fails to detect corrupt data
public static String fromCodePage(String charset, byte[] bytes)
  throws UnsupportedEncodingException {
  return new String(bytes, charset);
}
= new BigInteger("530500452766");
    byte[] byteArray = x.toByteArray();
    String s = new String(byteArray);
    System.out.println(s);
  }
}

Compliant Solution

The java.nio.charset.CharsetEncoder class can transform a sequence of 16-bit Unicode characters into a sequence of bytes in a specific charset, while the  and java.nio.charset.CharacterDecoder class can reverse the procedure [API 2006].This compliant solution uses the CharsetEncoder and CharsetDecoder classes to handle encoding conversions provide greater control over the process.  In this compliant solution, the CharsetDecode.decode() method is used to convert the byte array containing the two's-complement representation of this BigInteger value to a CharBuffer.  Because the bytes do not represent a valid UTF-16, the input is considered malformed, and a MalformedInputException is thrown.

Code Block
bgColor#ccccff
import java.math.BigInteger;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.MalformedInputException;
import java.nio.charset.StandardCharsets;
import java.nio.charset.UnmappableCharacterException;

public class CharsetConversion {

  public static void main(String[] args) {
    CharBuffer charBuffer;
    CharsetDecoder decoder = StandardCharsets.UTF_816.newDecoder();
    BigInteger x = new BigInteger("530500452766");
    byte[] byteArray = x.toByteArray();
    ByteBuffer byteBuffer = ByteBuffer.wrap(byteArray);
    try {
      charBuffer = decoder.decode(byteBuffer);
      s = charBuffer.toString();
      System.out.println(charBuffers);
    } catch (IllegalStateException e) {
      e.printStackTrace();
    } catch (MalformedInputException e) {
      e.printStackTrace();
    } catch (UnmappableCharacterException e) {
      e.printStackTrace();
    } catch (CharacterCodingException e) {
      e.printStackTrace();
    }
  }
}

...