Page History

Converting String objects to different character encodings or to byte arrays may result in loss of dataString objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8. Errors may occur when converting between differently coded character data. There are two general types of encoding errors. If the byte sequence is not valid for the specified charset then the input is considered malformed. If the byte sequence cannot be mapped to an equivalent character sequence then an unmappable character has been encountered.

According to the Java API API [API 20062014] for the String constructors:

The behavior of this constructor when the given bytes are not valid in the given charset is unspecified.

Similarly, the description of the ], String.getBytes(Charset) method documentationstates:

This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array.

When a String object is converted to bytes, for example, for writing to a file, and the string might contain sequences of unmappable characters, proper character encoding must be performed.

Converting a byte array to a string has similar issues if the byte array is not a valid encoded string. Attempts to read raw binary data as character-encoded will not succeed if bytes fall outside the default or specified encoding scheme and consequently fail to denote valid characters. For example, converting a cryptographic key containing nonrepresentable bytes to character-encoded data for transmission may result in an error.

The CharsetEncoder class is used to transform character data into a sequence of bytes in a specific charset. The input character sequence is provided in a character buffer or a series of such buffers. The output byte sequence is written to a byte buffer or a series of such buffers. The CharsetDecoder class reverses this process by transforming a sequence of bytes in a specific charset into character data. The input byte sequence is provided in a byte buffer or a series of such buffers, while the output character sequence is written to a character buffer or a series of such buffers.

Special care should be taken when decoding untrusted byte data to ensure that malformed input or unmappable character errors do not result in defects and vulnerabilities. Encoding errors can also occur, for example, encoding a cryptographic key containing malformed input for transmission will result in an error. Encoding and decoding errors typically result in data corruption.

Noncompliant Code Example

This noncompliant code example corrupts the data when string contains characters that are not representable in the specified charsetexample is similar to the one used in STR03-J. Do not represent numeric data as strings in that it attempts to convert a byte array containing the two's-complement representation of this BigInteger value to a String. Because the byte array contains malformed-input sequences, the behavior of the String constructor is unspecified.

Code Block

bgColor	#FFcccc

// Corrupts data on errors
import java.math.BigInteger;
import java.nio.CharBuffer;

public class CharsetConversion {
  public static byte[]void toCodePagemain(String[] charset, String string)
args) {
   throws UnsupportedEncodingExceptionBigInteger {
x = returnnew string.getBytes(charsetBigInteger("530500452766");
}
 
// Fails to detectbyte[] corruptbyteArray data
public static String fromCodePage(String charset, byte[] bytes)= x.toByteArray();
  throws UnsupportedEncodingException {
String s return= new String(bytes, charsetbyteArray);
    System.out.println(s);
  }
}

Compliant Solution

The java.nio.charset.CharsetEncoder class can transform a sequence of 16-bit Unicode characters into a sequence of bytes in a specific charset, while the and java.nio.charset.CharacterDecoder class can reverse the procedure [API 2006].This compliant solution uses the CharsetEncoder and CharsetDecoder classes to handle encoding conversions provide greater control over the process. In this compliant solution, the CharsetDecode.decode() method is used to convert the byte array containing the two's-complement representation of this BigInteger value to a CharBuffer. Because the bytes do not represent a valid UTF-16, the input is considered malformed, and a MalformedInputException is thrown.

Code Block

bgColor	#ccccff

public static byte[] toCodePage(String charset, String string)
  throws IOException {
   
  Charset cs = Charset.forName(charset);
  CharsetEncoder coder = cs.newEncoder();
  ByteBuffer bytebuf = coder.encode(CharBuffer.wrap(string));
  byte[] bytes = new byte[bytebuf.limit()];
  bytebuf.get(bytes);
  return bytes;
}

Noncompliant Code Example

This noncompliant code example attempts to append a string to a text file in the specified encoding. This is erroneous because the String may contain unrepresentable characters.

Code Block

bgColor	#FFcccc

// Corrupts data on errors
import java.math.BigInteger;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.MalformedInputException;
import java.nio.charset.StandardCharsets;
import java.nio.charset.UnmappableCharacterException;

public class CharsetConversion {
  public static void toFilemain(String[] charset, String filename,args) {
    CharBuffer charBuffer;
    CharsetDecoder decoder = StandardCharsets.UTF_16.newDecoder();
    BigInteger x = new BigInteger("530500452766");
    String string) throws IOException {byte[] byteArray = x.toByteArray();
   
 ByteBuffer FileOutputStreambyteBuffer stream = new FileOutputStream(filename, trueByteBuffer.wrap(byteArray);
  OutputStreamWriter writer = new OutputStreamWriter(stream, charset);
  writer.write(string, 0, string.length());
  writer.close();
}

Compliant Solution

This compliant solution uses the CharsetEncoder class to perform the required function.

Code Block

bgColor	#ccccff

public static void toFile(String filename, String string,  try {
      charBuffer = decoder.decode(byteBuffer);
      s = charBuffer.toString();
      System.out.println(s);
    } catch (IllegalStateException e) {
      e.printStackTrace();
    } catch     String charset) throws IOException {
   
  Charset cs = Charset.forName(charset);
  CharsetEncoder coder = cs.newEncoder();
  FileOutputStream stream = new FileOutputStream(filename, true(MalformedInputException e) {
      e.printStackTrace();
    } catch (UnmappableCharacterException e) {
      e.printStackTrace();
  OutputStreamWriter writer =} newcatch OutputStreamWriter(stream,CharacterCodingException codere); {
  writer.write(string, 0, string.length());
  writere.closeprintStackTrace();
}    }
    }

Use the FileInputStream and InputStreamReader objects to read back the data from the file. InputStreamReader accepts a optional CharsetDecoder argument, which must be the same as that previously used for writing to the file.

Exceptions

STR03-EX0: Binary data that is expected to be a valid string may be read and converted to a string. How to perform this operation securely is explained in rule STR04-J. Use compatible character encodings when communicating string data between processes.

Risk Assessment

Malformed input or unmappable character errors can result in a loss of data integrityAttempting to read a byte array containing binary data as if it were character data can produce erroneous results.

Rule	Severity	Likelihood	Remediation Cost	Priority	Level
STR03STR05-J	low	unlikely	medium	P2	L3

Related Guidelines

MITRE CWE	CWE-838. Inappropriate Encoding for Output Context
	CWE-116. Improper Encoding or Escaping of Output

Bibliography

[API 2006]

Class String

Image Modified Image Modified Image Modified

Space shortcuts

Page tree

Versions Compared

Old Version 88

New Version Current

Key

Noncompliant Code Example

Compliant Solution

Noncompliant Code Example

Compliant Solution

Exceptions

Risk Assessment

Risk Assessment

Related Guidelines

Bibliography