STR03-J. Do not convert between strings and bytes without specifying a valid character encoding

Performing conversions of String objects between different character encodings or to byte arrays may result in loss of data.

According to the Java API [API 2006], String.getBytes(Charset) method documentation:

This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array.

When a String must be converted to bytes, for example, for writing to a file, and the string might contain sequences of unmappable characters, proper character encoding must be performed.

Converting a byte array to a string has similar consequences if the byte array does not indicate a valid encoded string. Attempts to read raw binary data as if it were character-encoded data often fail because some of the bytes fall outside the default or specified encoding scheme and for that reason fail to denote valid characters. For example, converting a cryptographic key containing nonrepresentable bytes to character-encoded data for transmission may result in an error.

Noncompliant Code Example

This noncompliant code example attempts to convert the byte array representing a BigInteger into a String. Because some of the bytes do not denote valid characters, the resulting String representation loses information. Converting the String back to a BigInteger produces a different value.

BigInteger x = new BigInteger("530500452766");
// convert x to a String
byte[] byteArray = x.toByteArray();
String s = new String(byteArray);
// convert s back to a BigInteger
byteArray = s.getBytes();
x = new BigInteger(byteArray);

When this program was run on a Linux platform where the default character encoding is US-ASCII, the string s got the value {?J??, because some of the characters were unprintable. When converted back to a BigInteger, x got the value 149830058370101340468658109.

Compliant Solution

This compliant solution first produces a String representation of the BigInteger object and then converts the String object to a byte array. This process is reversed on input. Because the textual representation in the String object was generated by the BigInteger class, it contains valid characters.

Do not try to convert the String object to a byte array to obtain the original BigInteger. Character encoded data may yield a byte array that, when converted to a BigInteger, results in a completely different value.

BigInteger x = new BigInteger("530500452766");
String s = x.toString();  // valid character data
try {
  byte[] byteArray = s.getBytes("UTF8");
  // ns prints as "530500452766"
  String ns = new String(byteArray, "UTF8");  
  // construct the original BigInteger
  BigInteger x1 = new BigInteger(ns); 
} catch (UnsupportedEncodingException ex) {
  // handle error
}

Noncompliant Code Example

This noncompliant code example corrupts the data when string contains characters that are not representable in the specified charset.

// Corrupts data on errors
public static byte[] toCodePage(String charset, String string)
  throws UnsupportedEncodingException {
  return string.getBytes(charset);
}
 
// Fails to detect corrupt data
public static String fromCodePage(String charset, byte[] bytes)
  throws UnsupportedEncodingException {
  return new String(bytes, charset);
}

Compliant Solution

The java.nio.charset.CharsetEncoder class can transform a sequence of 16-bit Unicode characters into a sequence of bytes in a specific charset, while the java.nio.charset.CharacterDecoder class can reverse the procedure [API 2006].

This compliant solution uses the CharsetEncoder and CharsetDecoder classes to handle encoding conversions.

public static byte[] toCodePage(String charset, String string)
  throws IOException {
   
  Charset cs = Charset.forName(charset);
  CharsetEncoder coder = cs.newEncoder();
  ByteBuffer bytebuf = coder.encode(CharBuffer.wrap(string));
  byte[] bytes = new byte[bytebuf.limit()];
  bytebuf.get(bytes);
  return bytes;
}

Noncompliant Code Example

This noncompliant code example attempts to append a string to a text file in the specified encoding. This is erroneous because the String may contain unrepresentable characters.

// Corrupts data on errors
public static void toFile(String charset, String filename,
                        String string) throws IOException {
   
  FileOutputStream stream = new FileOutputStream(filename, true);
  OutputStreamWriter writer = new OutputStreamWriter(stream, charset);
  writer.write(string, 0, string.length());
  writer.close();
}

Compliant Solution

This compliant solution uses the CharsetEncoder class to perform the required function.

public static void toFile(String filename, String string,
                        String charset) throws IOException {
   
  Charset cs = Charset.forName(charset);
  CharsetEncoder coder = cs.newEncoder();
  FileOutputStream stream = new FileOutputStream(filename, true);
  OutputStreamWriter writer = new OutputStreamWriter(stream, coder);
  writer.write(string, 0, string.length());
  writer.close();
}        

Use the FileInputStream and InputStreamReader objects to read back the data from the file. InputStreamReader accepts a optional CharsetDecoder argument, which must be the same as that previously used for writing to the file.

Exceptions

STR03-EX0: Binary data that is expected to be a valid string may be read and converted to a string. How to perform this operation securely is explained in rule STR04-J. Use compatible character encodings on both sides of file or network IO. Also see rule STR01-J. Don't form strings containing partial characters.

Risk Assessment

Attempting to read a byte array containing binary data as if it were character data can produce erroneous results.

Rule	Severity	Likelihood	Remediation Cost	Priority	Level
STR03-J	low	unlikely	medium	P2	L3

Related Guidelines

MITRE CWE	CWE-838. Inappropriate Encoding for Output Context
	CWE-116. Improper Encoding or Escaping of Output

Bibliography

[API 2006]

Class String

Space shortcuts

Page tree

Noncompliant Code Example

Compliant Solution

Noncompliant Code Example

Compliant Solution

Noncompliant Code Example

Compliant Solution

Exceptions

Risk Assessment

Related Guidelines

Bibliography