Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8.  Errors may occur when converting between differently coded character data.  For example,

MalformedInputException - If the byte sequence starting at the input buffer's current position is not legal for this charset and the current malformed-input action is CodingErrorAction.REPORTUnmappableCharacterException - If the byte sequence starting at the input buffer's current position cannot be mapped to an equivalent character sequence and the current unmappable-character action is CodingErrorAction.REPORT

According to the Java API  [API 2014] for the String class:

The behavior of this constructor when the given bytes are not valid in the given charset is unspecified

...

.

According to the Java API [API 20062014], String.getBytes(Charset) method documentation:

...

Converting a byte array to a string has similar issues if the byte array is not a valid encoded string. Attempts to read raw binary data as character-encoded will not succeed if bytes fall outside the default or specified encoding scheme and consequently fail to denote valid characters. For example, converting a cryptographic key containing nonrepresentable bytes to character-encoded data for transmission may result in an error.

 

Noncompliant Code Example

This noncompliant code example corrupts the data when string contains characters that are not representable in the specified charset.

Code Block
bgColor#FFcccc
// Corrupts data on errors
public static byte[] toCodePage(String charset, String string)
  throws UnsupportedEncodingException {
  return string.getBytes(charset);
}
 
// Fails to detect corrupt data
public static String fromCodePage(String charset, byte[] bytes)
  throws UnsupportedEncodingException {
  return new String(bytes, charset);
}

Compliant Solution

The java.nio.charset.CharsetEncoder class can transform a sequence of 16-bit Unicode characters into a sequence of bytes in a specific charset, while the java.nio.charset.CharacterDecoder class can reverse the procedure [API 2006].

This compliant solution uses the CharsetEncoder and CharsetDecoder classes to handle encoding conversions.

Code Block
bgColor#ccccff
import java.math.BigInteger;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.MalformedInputException;
import java.nio.charset.StandardCharsets;
import java.nio.charset.UnmappableCharacterException;

public class CharsetConversion {

  public static void byte[] toCodePage(String charset, String string)
  throws IOException {
   
  Charset cs = Charset.forName(charset);
  CharsetEncoder coder = cs.newEncoder();
  ByteBuffer bytebuf = coder.encode(CharBuffer.wrap(string));
  byte[] bytes = new byte[bytebuf.limit()];
  bytebuf.get(bytes);
  return bytes;main(String[] args) {
    CharBuffer charBuffer;
    CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
    BigInteger x = new BigInteger("530500452766");
    byte[] byteArray = x.toByteArray();
    ByteBuffer byteBuffer = ByteBuffer.wrap(byteArray);
    try {
      charBuffer = decoder.decode(byteBuffer);
      System.out.println(charBuffer);
    } catch (IllegalStateException e) {
      e.printStackTrace();
    } catch (MalformedInputException e) {
      e.printStackTrace();
    } catch (UnmappableCharacterException e) {
      e.printStackTrace();
    } catch (CharacterCodingException e) {
      e.printStackTrace();
    }
  }
}

Noncompliant Code Example

This noncompliant code example attempts to append a string to a text file in the specified encoding. This is erroneous because the String may contain unrepresentable characters.

Code Block
bgColor#FFcccc
// Corrupts data on errors
public static void toFile(String charset, String filename,
                        String string) throws IOException {
   
  FileOutputStream stream = new FileOutputStream(filename, true);
  OutputStreamWriter writer = new OutputStreamWriter(stream, charset);
  writer.write(string, 0, string.length());
  writer.close();
}

Compliant Solution

This compliant solution uses the CharsetEncoder class to perform the required function.

...

Use the FileInputStream and InputStreamReader objects to read back the data from the file. InputStreamReader accepts a optional CharsetDecoder argument, which must be the same as that previously used for writing to the file.

Exceptions

STR05-EX0: Binary data that is expected to be a valid string may be read and converted to a string. How to perform this operation securely is explained in rule STR04-J. Use compatible character encodings when communicating string data between JVMs

Risk Assessment

Attempting to read a byte array containing binary data as if it were character data can produce erroneous results.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

STR05-J

low

unlikely

medium

P2

L3

Related Guidelines

MITRE CWE

CWE-838. Inappropriate Encoding for Output Context

 

CWE-116. Improper Encoding or Escaping of Output

Bibliography