Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8.  Errors may occur when converting between differently coded character data.  There are two general types of encoding errors. If the byte sequence is not valid for the specified charset then the input is considered malformed. If the byte sequence cannot be mapped to an equivalent character sequence then an unmappable character has been encountered.

According to the Java API  [API 2014] for the String constructors:

The behavior of this constructor when the given bytes are not valid in the given charset is unspecified.

Similarly, the description of the String.getBytes(Charset) method states:

This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array.

The CharsetEncoder class is used to transform character data into a sequence of bytes in a specific charset.   The input character sequence is provided in a character buffer or a series of such buffers. The output byte sequence is written to a byte buffer or a series of such buffers.  The CharsetDecoder class reverses this process by transforming a sequence of bytes in a specific charset into character data.  The input byte sequence is provided in a byte buffer or a series of such buffers, while the output character sequence is written to a character buffer or a series of such buffers.

Special care should be taken when decoding untrusted byte data to ensure that malformed input or unmappable character errors do not result in defects and vulnerabilities.  Encoding errors can also occur, for example, encoding a cryptographic key containing malformed input for transmission will result in an error. Encoding and decoding errors typically result in data corruption. 

In Java, byte arrays are often used to transmit raw binary data and character encoded data. An attempt to read raw binary data as if it were character encoded data fails because some of the bytes may not represent valid characters in the default or specified encoding scheme. For instance, a cryptographic key containing non-representable bytes may be required to be converted to character encoded data for its suitable transmission. However, this may produce errorneous results.

Also, see FIO02-J. Keep track of bytes read and account for character encoding while reading data and FIO03-J. Specify the character encoding while performing file or network IO.

Noncompliant Code Example

This noncompliant code example is similar to the one used in STR03-J. Do not represent numeric data as strings in that it attempts to convert the a byte array representing a BigInteger into containing the two's-complement representation of this BigInteger value to a String. Unfortunately, some of the bytes do not denote valid characters, so the resulting String representation loses information. (Converting the String back to a BigInteger produces a different number.)Because the byte array contains malformed-input sequences, the behavior of the String constructor is unspecified.

Code Block
bgColor#FFcccc
import java.math.BigInteger;
import java.nio.CharBuffer;

public class CharsetConversion {
  public static void main(String[] args) {
    BigInteger x = new BigInteger ("530500452766");
    byte [] byteArray = x.toByteArray();
 // convert to byte array
String s = new String(byteArray);
    // s prints as "{„J?ž" -
                                     // the fourth character is invalid

// convert s back to a BigInteger
byteArray = s.getBytes();       // convert to bytes
x = new BigInteger(byteArray);  // now x = 530500435870

Compliant Solution

This compliant solution converts a byte array to a String object. The byte array has been generated from a BigInteger, and represents valid characters.

Code Block
bgColor#ccccff

BigInteger x = new BigInteger ("530500452766");
String s = x.toString();  // valid character data

byte [] byteArray = s.getBytes("UTF8");
String ns = new String(byteArray, "UTF8");  // ns prints as "530500452766"

BigInteger x1 = new BigInteger(ns); // construct the original BigInteger

Do not try to convert the String object to a byte array to obtain the original BigInteger. Character encoded data may yield a byte array which when converted to a BigInteger, results in a completely different value.

Risk Assessment

System.out.println(s);
  }
}

Compliant Solution

The java.nio.charset.CharsetEncoder and java.nio.charset.CharacterDecoder provide greater control over the process.  In this compliant solution, the CharsetDecode.decode() method is used to convert the byte array containing the two's-complement representation of this BigInteger value to a CharBuffer.  Because the bytes do not represent a valid UTF-16, the input is considered malformed, and a MalformedInputException is thrown.

Code Block
bgColor#ccccff
import java.math.BigInteger;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.MalformedInputException;
import java.nio.charset.StandardCharsets;
import java.nio.charset.UnmappableCharacterException;

public class CharsetConversion {
  public static void main(String[] args) {
    CharBuffer charBuffer;
    CharsetDecoder decoder = StandardCharsets.UTF_16.newDecoder();
    BigInteger x = new BigInteger("530500452766");
    byte[] byteArray = x.toByteArray();
    ByteBuffer byteBuffer = ByteBuffer.wrap(byteArray);
    try {
      charBuffer = decoder.decode(byteBuffer);
      s = charBuffer.toString();
      System.out.println(s);
    } catch (IllegalStateException e) {
      e.printStackTrace();
    } catch (MalformedInputException e) {
      e.printStackTrace();
    } catch (UnmappableCharacterException e) {
      e.printStackTrace();
    } catch (CharacterCodingException e) {
      e.printStackTrace();
    }
  }
}

Risk Assessment

Malformed input or unmappable character errors can result in a loss of data integrityAttempting to read a byte array containing raw character data as if it were character data may produce erroneous results.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

FIO11STR05-J

Low low

Unlikely unlikely

Medium

P???

L???

Automated Detection

TODO

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

References

Wiki Markup
\[[API 06|AA. Java References#API 06]\] class [String|http://java.sun.com/javase/6/docs/api/java/lang/String.html]

medium

P2

L3

Related Guidelines

MITRE CWE

CWE-838. Inappropriate Encoding for Output Context

 

CWE-116. Improper Encoding or Escaping of Output

Bibliography

 

Image Added Image Added Image Added FIO05-J. Do not create multiple buffered wrappers on an InputStream      09. Input Output (FIO)      09. Input Output (FIO)