Many classes allow inclusion of escape sequences in character and string literals; examples include java.util.regex.Pattern
as well as classes that support XML- and SQL-based actions by passing string arguments to methods. According to the Java Language Specification (JLS), §3.10.6, "Escape Sequences for Character and String Literals" [JLS 2013],
The character and string escape sequences allow for the representation of some nongraphic characters as well as the single quote, double quote, and backslash characters in character literals (§3.10.4) and string literals (§3.10.5).
Correct use of escape sequences in string literals requires understanding how the escape sequences are interpreted by the Java compiler as well as how they are interpreted by any subsequent processor, such as a SQL engine. SQL statements may require escape sequences (for example, sequences containing \t
, \n
, \r
) in certain cases, such as when storing raw text in a database. When representing SQL statements in Java string literals, each escape sequence must be preceded by an extra backslash for correct interpretation.
As another example, consider the Pattern
class used in performing regular expression-related tasks. A string literal The Pattern
class is very useful for performing operations involving regular expressions. However, unlike in other languages where a regular expression represented as a character string is literally used for pattern matching, in Java, a given String
used for pattern matching is compiled into an instance of the Pattern
. As a result, escape characters are interpreted differently than in other languages.
Noncompliant Code Example
type. When the pattern to be matched contains a sequence of characters identical to one of the Java escape sequences—"\"
and "n"
, for example—the Java compiler treats that portion of the string as a Java escape sequence and transforms the sequence into an actual newline character. To insert the newline escape sequence, rather than a literal newline character, the programmer must precede the "\n"
sequence with an additional backslash to prevent the Java compiler from replacing it with a newline character. The string constructed from the resulting sequence,
Code Block |
---|
\\n
|
consequently contains the correct two-character sequence \n
and correctly denotes the escape sequence for newline in the pattern.
In general, for a particular escape character of the form \X
, the equivalent Java representation is
Code Block |
---|
\\X
|
Noncompliant Code Example (String Literal)
This noncompliant code example defines a method, splitWords()
, that finds matches between the string literal (WORDS
) and the input sequence. It is expected that WORDS
would hold the escape sequence for matching a word boundary. However, the Java compiler treats the "\b"
literal as a Java escape sequence, and the string WORDS
silently compiles to a regular expression that checks for a single backspace characterIn the following example, a method performing matching to regular expressions, matchPattern
, is implemented. However, the assumption is that the pattern matches to word boundaries and will thus split a given string into individual words.
Code Block | ||
---|---|---|
| ||
import java.util.regex.Pattern; public class BadSplitterSplitter { private// finalInterpreted Stringas WORDSbackspace = "\b"; // IntendFails to split on word boundaries private final String WORDS = "\b"; public String[] splitsplitWords(String input) { Pattern ppattern = Pattern.compile(WORDS); String[] input_array = ppattern.split(input); return input_array; } } |
The String WORDS
is compiled to the backspace character instead of the regular expression for splitting on word boundaries.
...
Compliant Solution (String Literal)
This compliant solution shows the correct correctly escaped value of the String WORDS
to produce string literal WORDS
that results in a regular expression designed to split on word boundaries.:
Code Block | ||
---|---|---|
| ||
import java.util.regex.Pattern; public class GoodSplitterSplitter { private// finalInterpreted Stringas WORDStwo =chars, "\\b";'\' and 'b' // WillCorrectly allow splittingsplits on word boundaries private final String WORDS = "\\b"; public String[] split(String input){ Pattern ppattern = Pattern.compile(WORDS); String[] input_array = ppattern.split(input); return input_array; } } |
In this example, the String WORDS
is compiled to "\b", the pattern for matching to word boundaries. This is because the escape on the slash is converted to a single slash when the String is compiled.
Noncompliant Code Example
...
In the following example, a method performing matching to regular expressions, matchPattern
, is implemented to split input Strings on one or more white space characters. However, the String SPACE
is not correctly formed.
(String Property)
This noncompliant code example uses the same method, splitWords()
. This time the WORDS
string is loaded from an external properties file.
Code Block | ||
---|---|---|
public class Splitter | ||
Code Block | ||
| ||
import java.util.regex.Pattern; public class BadSplitter { private final String SPACE = "\s+"; // Intend to split on one or more whitespace WORDS; public String[] split(String input){ Pattern p = Pattern.compile(SPACE); // Compiler errorSplitter() throws IOException { String[] input_arrayProperties properties = p.split(inputnew Properties(); return input_array; } } |
String SPACE
attempts to escape the character 's', producing an illegal escape character compiler error.
Compliant Solution
This compliant solution shows the correct value of the String SPACE
to produce a regular expression to split one or more white space characters.
Code Block | ||
---|---|---|
| ||
import java.util.regex.Pattern; public class BadSplitter { private final String SPACE = "\\s+";properties.load(new FileInputStream("splitter.properties")); WORDS = properties.getProperty("WORDS"); } public String[] split(String input){ Pattern ppattern = Pattern.compile(SPACEWORDS); // Will split on one or more white space characters String[] input_array = ppattern.split(input); return input_array; } } |
In this examplethe properties file, the String SPACE
is compiled to "\s+", the pattern for matching to one or more white space characters. WORD
property is once again incorrectly specified as \b
.
Code Block | ||
---|---|---|
| ||
WORDS=\b |
This is read by the Properties.load()
method as a single character b
, which causes the split()
method to split strings along the letter b
. Although the string is interpreted differently than if it were a string literal, as in the previous noncompliant code example, the interpretation is incorrect.
Compliant Solution (String Property)
This compliant solution shows the correctly escaped value of the WORDS
property:
Code Block | ||
---|---|---|
| ||
WORDS=\\b |
Applicability
Incorrect use of escape characters in string inputs can result in misinterpretation and potential corruption of data.
Automated Detection
Tool | Version | Checker | Description | ||||||
---|---|---|---|---|---|---|---|---|---|
The Checker Framework |
| Tainting Checker | Trust and security errors (see Chapter 8) |
Bibliography
[API 2013] | Class Pattern, "Backslashes, Escapes, and Quoting" Package java.sql |
[JLS 2013] | §3.10.6, "Escape Sequences for Character and String Literals" |
...