Many classes allow inclusion of escape sequences in character and string literals; examples include java.util.regex.Pattern
as well as classes that support XML- and SQL-based actions by passing string arguments to methods. According to the Java Language Specification [JLS 2011], Section 3.10.6, "Escape Sequences for Character and String Literals,"
The character and string escape sequences allow for the representation of some nongraphic characters as well as the single quote, double quote, and backslash characters in character literals (§3.10.4) and string literals (§3.10.5).
Correct use of escape sequences in string literals requires understanding of how the escape sequences are interpreted by the Java compiler. For instance, SQL statements may require escape sequences (e.g.for example, sequences containing \t
, \n
, \r
) in certain cases, such as when storing raw text in a database. When representing SQL statements in Java string literals, each escape sequence must be preceded by an extra backslash for correct interpretation.
As another example, consider the Pattern
class used in performing regular expression-related tasks. A string literal used for pattern matching is compiled into an instance of the Pattern
type. When the pattern to be matched contains a sequence of characters identical to one of the Java escape sequences — sequences—"\"
and "n"
, for example — the example—the Java compiler treats that portion of the string as a Java escape sequence and transforms the sequence into an actual newline character. To insert the newline escape sequence, rather than a literal newline character, the programmer must precede the "\n"
sequence with an additional backslash to prevent the Java compiler from replacing it with a newline character. The string constructed from the resulting sequence,
Code Block |
---|
\\n |
consequently contains the correct two-character sequence \n
and correctly denotes the escape sequence for newline in the pattern.
...
This noncompliant code example defines a method method, splitWords()
that , that finds matches between the string literal (WORDS
) and the input sequence. It is expected that WORDS
would hold the escape sequence for matching a word boundary. However, the Java compiler treats the "\b"
literal as a Java escape sequence, and the string WORDS
silently compiles to a backspace character.
Code Block | ||
---|---|---|
| ||
public class Splitter { private final String WORDS = "\b"; // interpretedInterpreted as backspace,. failsFails to split on word boundaries public String[] splitWords(String input){ Pattern pattern = Pattern.compile(WORDS); String[] input_array = pattern.split(input); return input_array; } } |
...
Code Block | ||
---|---|---|
| ||
public class Splitter { private final String WORDS = "\\b"; // interpretedInterpreted as two chars, '\' and 'b'. Correctly splits on word boundaries public String[] split(String input){ Pattern pattern = Pattern.compile(WORDS); String[] input_array = pattern.split(input); return input_array; } } |
...
This noncompliant code example uses the same method method, splitWords()
. This time the WORDS
string is loaded from an external properties file.
...
In the properties file, the WORD
property is once again incorrectly specified as \b
. This is read by the Properties.load()
method as a single character b
, which causes the split()
method to split strings along the letter b
. While Although the string is interpreted differently than if it were a string literal, as in the previous noncompliant code example, it is still interpreted incorrectly.
...
Incorrect use of escape characters in string inputs can result in misinterpretation and potential corruption of data.
Bibliography
[API 2011] | Class Pattern, "Backslashes, |
...
Escapes, and |
...
Quoting" |
...
...