STR03-J. Do not convert between strings and bytes without specifying a valid character encoding

Converting String objects to different character encodings or to byte arrays may result in loss of data.

According to the Java API [API 2006], String.getBytes(Charset) method documentation:

This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array.

When a String object is converted to bytes, for example, for writing to a file, and the string might contain sequences of unmappable characters, proper character encoding must be performed.

Converting a byte array to a string has similar issues if the byte array is not a valid encoded string. Attempts to read raw binary data as character-encoded will not succeed if bytes fall outside the default or specified encoding scheme and consequently fail to denote valid characters. For example, converting a cryptographic key containing nonrepresentable bytes to character-encoded data for transmission may result in an error.

Noncompliant Code Example

This noncompliant code example attempts to convert the byte array representing a BigInteger into a String. Because some of the bytes do not denote valid characters, the resulting String representation loses information. Converting the String back to a BigInteger produces a different value.

BigInteger x = new BigInteger("530500452766");
// convert x to a String
byte[] byteArray = x.toByteArray();
String s = new String(byteArray);
// convert s back to a BigInteger
byteArray = s.getBytes();
x = new BigInteger(byteArray);

When this program was run on a Linux platform where the default character encoding is US-ASCII, the string s got the value {?J??, because some of the characters were unprintable. When converted back to a BigInteger, x got the value 149830058370101340468658109.

Compliant Solution

This compliant solution first produces a String representation of the BigInteger object and then converts the String object to a byte array. This process is reversed on input. Because the textual representation in the String object was generated by the BigInteger class, it contains valid characters.

BigInteger x = new BigInteger("530500452766");
String s = x.toString();  // valid character data
try {
  byte[] byteArray = s.getBytes("UTF8");
  // ns prints as "530500452766"
  String ns = new String(byteArray, "UTF8");  
  // construct the original BigInteger
  BigInteger x1 = new BigInteger(ns); 
} catch (UnsupportedEncodingException ex) {
  // handle error
}

Do not try to convert the String object to a byte array to obtain the original BigInteger. Character encoded data may yield a byte array that, when converted to a BigInteger, results in a completely different value.

Noncompliant Code Example

This noncompliant code example corrupts the data when string contains characters that are not representable in the specified charset.

// Corrupts data on errors
public static byte[] toCodePage(String charset, String string)
  throws UnsupportedEncodingException {
  return string.getBytes(charset);
}
 
// Fails to detect corrupt data
public static String fromCodePage(String charset, byte[] bytes)
  throws UnsupportedEncodingException {
  return new String(bytes, charset);
}

Compliant Solution

The java.nio.charset.CharsetEncoder class can transform a sequence of 16-bit Unicode characters into a sequence of bytes in a specific charset, while the java.nio.charset.CharacterDecoder class can reverse the procedure [API 2006].

This compliant solution uses the CharsetEncoder and CharsetDecoder classes to handle encoding conversions.

public static byte[] toCodePage(String charset, String string)
  throws IOException {
   
  Charset cs = Charset.forName(charset);
  CharsetEncoder coder = cs.newEncoder();
  ByteBuffer bytebuf = coder.encode(CharBuffer.wrap(string));
  byte[] bytes = new byte[bytebuf.limit()];
  bytebuf.get(bytes);
  return bytes;
}

Noncompliant Code Example

This noncompliant code example attempts to append a string to a text file in the specified encoding. This is erroneous because the String may contain unrepresentable characters.

// Corrupts data on errors
public static void toFile(String charset, String filename,
                        String string) throws IOException {
   
  FileOutputStream stream = new FileOutputStream(filename, true);
  OutputStreamWriter writer = new OutputStreamWriter(stream, charset);
  writer.write(string, 0, string.length());
  writer.close();
}

Compliant Solution

This compliant solution uses the CharsetEncoder class to perform the required function.

public static void toFile(String filename, String string,
                        String charset) throws IOException {
   
  Charset cs = Charset.forName(charset);
  CharsetEncoder coder = cs.newEncoder();
  FileOutputStream stream = new FileOutputStream(filename, true);
  OutputStreamWriter writer = new OutputStreamWriter(stream, coder);
  writer.write(string, 0, string.length());
  writer.close();
}

Use the FileInputStream and InputStreamReader objects to read back the data from the file. InputStreamReader accepts a optional CharsetDecoder argument, which must be the same as that previously used for writing to the file.

Exceptions

STR03-EX0: Binary data that is expected to be a valid string may be read and converted to a string. How to perform this operation securely is explained in rule STR04-J. Use compatible character encodings on both sides of file or network IO. Also see rule STR01-J. Don't form strings containing partial Unicode code points.

Risk Assessment

Attempting to read a byte array containing binary data as if it were character data can produce erroneous results.

Rule	Severity	Likelihood	Remediation Cost	Priority	Level
STR03-J	low	unlikely	medium	P2	L3

Related Guidelines

MITRE CWE	CWE-838. Inappropriate Encoding for Output Context
	CWE-116. Improper Encoding or Escaping of Output

Bibliography

[API 2006]

Class String

Space shortcuts

Page tree

Noncompliant Code Example

Compliant Solution

Noncompliant Code Example

Compliant Solution

Noncompliant Code Example

Compliant Solution

Exceptions

Risk Assessment

Related Guidelines

Bibliography