String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8. Errors may occur when converting between differently coded character data. There are two general types of encoding errors. If the byte sequence is not valid for the specified charset then the input is considered malformed. If the byte sequence cannot be mapped to an equivalent character sequence then an unmappable character has been encountered.
According to the Java API [API 2014] for the String
constructors:
The behavior of this constructor when the given bytes are not valid in the given charset is unspecified.
Similarly, the description of the String.getBytes(Charset)
method states:
This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array.
When a String
object is converted to bytes, for example, for writing to a file, and the string might contain sequences of unmappable characters, proper character encoding must be performed.
Noncompliant Code Example
This noncompliant code example is similar to the one used in STR03-J. Do not represent numeric data as strings in that it attempts to convert a byte array containing the two's-complement representation of this BigInteger
value to a String
. Because the byte array contains malformed-input sequences, the behavior of the String
constructor is unspecified.
import java.math.BigInteger; import java.nio.CharBuffer; public class CharsetConversion { public static void main(String[] args) { BigInteger x = new BigInteger("530500452766"); byte[] byteArray = x.toByteArray(); String s = new String(byteArray); System.out.println(s); } }
Compliant Solution
The java.nio.charset.CharsetEncoder
and java.nio.charset.CharacterDecoder
provide greater control over the process. In this compliant solution, the CharsetDecode.decode()
method is used to convert the byte array containing the two's-complement representation of this BigInteger
value to a CharBuffer
. Because the bytes do not represent a valid UTF-16, the input is considered malformed, and a MalformedInputException
is thrown.
import java.math.BigInteger; import java.nio.ByteBuffer; import java.nio.CharBuffer; import java.nio.charset.CharacterCodingException; import java.nio.charset.CharsetDecoder; import java.nio.charset.MalformedInputException; import java.nio.charset.StandardCharsets; import java.nio.charset.UnmappableCharacterException; public class CharsetConversion { public static void main(String[] args) { CharBuffer charBuffer; CharsetDecoder decoder = StandardCharsets.UTF_16.newDecoder(); BigInteger x = new BigInteger("530500452766"); byte[] byteArray = x.toByteArray(); ByteBuffer byteBuffer = ByteBuffer.wrap(byteArray); try { charBuffer = decoder.decode(byteBuffer); s = charBuffer.toString(); System.out.println(s); } catch (IllegalStateException e) { e.printStackTrace(); } catch (MalformedInputException e) { e.printStackTrace(); } catch (UnmappableCharacterException e) { e.printStackTrace(); } catch (CharacterCodingException e) { e.printStackTrace(); } } }
Noncompliant Code Example
This noncompliant code example attempts to append a string to a text file in the specified encoding. This is erroneous because the String
may contain unrepresentable characters.
// Corrupts data on errors public static void toFile(String charset, String filename, String string) throws IOException { FileOutputStream stream = new FileOutputStream(filename, true); OutputStreamWriter writer = new OutputStreamWriter(stream, charset); writer.write(string, 0, string.length()); writer.close(); }
Compliant Solution
This compliant solution uses the CharsetEncoder
class to perform the required function.
public static void toFile(String filename, String string, String charset) throws IOException { Charset cs = Charset.forName(charset); CharsetEncoder coder = cs.newEncoder(); FileOutputStream stream = new FileOutputStream(filename, true); OutputStreamWriter writer = new OutputStreamWriter(stream, coder); writer.write(string, 0, string.length()); writer.close(); }
Use the FileInputStream
and InputStreamReader
objects to read back the data from the file. InputStreamReader
accepts a optional CharsetDecoder
argument, which must be the same as that previously used for writing to the file.
Exceptions
STR05-EX0: Binary data that is expected to be a valid string may be read and converted to a string. How to perform this operation securely is explained in rule STR04-J. Use compatible character encodings when communicating string data between JVMs.
Risk Assessment
Attempting to read a byte array containing binary data as if it were character data can produce erroneous results.
Rule | Severity | Likelihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
STR05-J | low | unlikely | medium | P2 | L3 |
Related Guidelines
Bibliography