Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: finished description and risk assessment

...

This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array.

When a String object is converted to bytes, for example, for writing to a file, and the string might contain sequences of unmappable characters, proper character encoding must be performed.

Converting a byte array to a string has similar issues if the byte array is not a valid encoded string. Attempts to read raw binary data as character-encoded will not succeed if bytes fall outside the default or specified encoding scheme and consequently fail to denote valid characters. For example, converting a cryptographic key containing nonrepresentable bytes to character-encoded data for transmission may result in an error.

The CharsetEncoder class is used to transform character data into a sequence of bytes in a specific charset.   The input character sequence is provided in a character buffer or a series of such buffers. The output byte sequence is written to a byte buffer or a series of such buffers.  The CharsetDecoder class reverses this process by transforming a sequence of bytes in a specific charset into character data.  The input byte sequence is provided in a byte buffer or a series of such buffers, while the output character sequence is written to a character buffer or a series of such buffers.

Special care should be taken when decoding untrusted byte data to ensure that malformed input or unmappable character errors do not result in defects and vulnerabilities.  Encoding errors can also occur, for example, encoding a cryptographic key containing malformed input for transmission will result in an error. Encoding and decoding errors typically result in data corruption. 

Noncompliant Code Example

...

STR05-EX0: Binary data that is expected to be a valid string may be read and converted to a string. How to perform this operation securely is explained in rule STR04-J. Use compatible character encodings when communicating string data between JVMs

Risk Assessment

Attempting to read a byte array containing binary data as if it were character data can produce erroneous resultsMalformed input or unmappable character errors can result in a loss of data integrity.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

STR05-J

low

unlikely

medium

P2

L3

...