Regular expressions (regex) are widely used to match strings of text. For example, the POSIX grep
utility supports regular expressions for finding patterns in the specified text. For introductory information on regular expressions, see the Java Tutorials [Java Tutorials]. The java.util.regex
package provides the Pattern
class that encapsulates a compiled representation of a regular expression and the Matcher
class, which is an engine that uses a Pattern
to perform matching operations on a CharSequence
.
Java's powerful regex facilities must be protected from misuse. An attacker may supply a malicious input that modifies the original regular expression in such a way that the regex fails to comply with the program's specification. This attack vector, called a regex injection, might affect control flow, cause information leaks, or result in denial-of-service (DoS) vulnerabilities.
Certain constructs
Java's regular expression facilities are wide ranging and powerful which can lead to unwanted modification of the original regular expression string to form a pattern that matches too widely, possibly resulting in far too much information being matched.
The primary means of preventing this vulnerability is to sanitize a regular expression string coming from untrusted input. Additionally, the programmer should look into ways of avoiding using regular expressions from untrusted input, or perhaps provide only a very limited subset of regular expression functionality to the user
Constructs and properties of Java regular expressions to watch out for includeare susceptible to exploitation:
- match flags used in non-capturing groups (These Matching flags: Untrusted inputs may override matching options that may or may not have been passed into to the
Pattern.compile()
method. - Greediness
Since Java regular expressions are similar to Perl, it is a good idea to apply lessons learned from Perl regex.
Noncompliant Code Example
This class does not sanitize the incoming regular expression, and as a result, exposes too much information to the user.
This program searches a database of users for searches that match a regular expressions to present search suggestions to the user.
No Format |
---|
A non-malicious use would be to enter "C" to match Charles and Cecilia. A malicious use would be to enter "?:)(^C,[0-9]+?,[0-9]+?$)|(?:" which grabs the IPs that made the search.
|
The outer parentheses defeat the grouping protection. Using the OR operator allows injection of any arbitrary regex. Now this use will reveal all times and IPs the keyword 'Bono' was searched.
Code Block | ||
---|---|---|
| ||
/* Say this logfile contains:
 * CSV style: search string, time (unix), ip (integer)
 *
 * Alice,1267773881,2147651708
 * Bono,1267774881,2147651708
 * Charles,1267775881,1175563058
* Cecilia,1267773222,291232332
 *
 * and the CSVLog class has a readLine() method which retrieves a single line from the CSVLog and returns null when at EOF
 */
private CSVLog logfile;
Â
//an application repeatedly calls this function that searches through the search log for search suggestions for autocompletion
public Set<String> suggestSearches(String search)
{
  Set<String> searches = new HashSet<String>();
  Â
  //construct regex from user's string //the regex matches valid lines and the grouping characters will limit the returned regex to the search string
  String regex = "^(" + search + ".*),[0-9]+?,[0-9]+?$";
  Pattern p = Pattern.compile(regex);
  String s;
  while ((s = logfile.readLine()) != null) { //gets a single line from the logfile
      Matcher m = p.matcher(s);
      if (m.find()) {
          String found = m.group(1);
          searches.add(found);
      }
  }
      Â
  return searches;
}
|
Compliant Solution
It is very difficult to filter out overly permissive regular expressions. It might be easier and more secure to rewrite the application to limit the usage of regular expressions.
For the above code sample, the easy solution is to parse the CSV into a class and limit the regular expression over the name field of the User class.
- : An untrusted input may attempt to inject a regex that changes the original regex to match as much of the string as possible, exposing sensitive information.
- Grouping: The programmer can enclose parts of a regular expression in parentheses to perform some common action on the group. An attacker may be able to change the groupings by supplying untrusted input.
Untrusted input should be sanitized before use to prevent regex injection. When the user must specify a regex as input, care must be taken to ensure that the original regex cannot be modified without restriction. Whitelisting characters (such as letters and digits) before delivering the user-supplied string to the regex parser is a good input sanitization strategy. A programmer must provide only a very limited subset of regular expression functionality to the user to minimize any chance of misuse.
Regex Injection Example
Suppose a system log file contains messages output by various system processes. Some processes produce public messages, and some processes produce sensitive messages marked "private." Here is an example log file:
Code Block |
---|
10:47:03 private[423] Successful logout name: usr1 ssn: 111223333
10:47:04 public[48964] Failed to resolve network service
10:47:04 public[1] (public.message[49367]) Exited with exit code: 255
10:47:43 private[423] Successful login name: usr2 ssn: 444556666
10:48:08 public[48964] Backup failed with error: 19
|
A user wishes to search the log file for interesting messages but must be prevented from seeing the private messages. A program might accomplish this by permitting the user to provide search text that becomes part of the following regex:
Code Block |
---|
(.*? +public\[\d+\] +.*<SEARCHTEXT>.*)
|
However, if an attacker can substitute any string for <SEARCHTEXT>
, he can perform a regex injection with the following text:
Code Block |
---|
.*)|(.*
|
When injected into the regex, the regex becomes
Code Block |
---|
(.*? +public\[\d+\] +.*.*)|(.*.*)
|
This regex will match any line in the log file, including the private ones.
Noncompliant Code Example
This noncompliant code example searches a log file using search terms from an untrusted user:
Code Block | ||
---|---|---|
| ||
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder | ||
Code Block | ||
| ||
import java.util.regex.Pattern; import java.util.regex.Matcher; import java.util.HashMapregex.Pattern; /\*public Usageclass Test2LogSearch <regex> \* Regex is used directly without santization causing sensitive data to be exposed \* \* Imagine this program searches a database of users for usernames that match a regex \* Non malicious usage: Test1 John.\* \* Malicious usage: (?s)John.\* */ public class Test2 { public static class User { String name, password; public User(String name, String password) { Â Â Â Â Â Â Â Â Â setName(name); Â Â Â Â Â Â Â Â Â setPassword(password); Â Â Â Â Â Â } private void setName(String n) { name = n; } private void setPassword(String pw) { password = pw; } public String getName() { return name; } } public static void main(String\[\] args) { if (args.length < 1) { Â Â Â Â Â Â Â Â Â System.err.println("Failed to specify a regex"); Â Â Â Â Â Â Â Â Â return; Â Â Â Â Â Â } String sensitiveData; //represents sensitive data from a file or something int flags; String regex; Pattern p; Matcher m; HashMap<String, User> userMap = new HashMap<String, User>(); //imagine a CSV style database: user,password sensitiveData = "JohnPaul,HearsGodsVoice\nJohnJackson,OlympicBobsleder\nJohnMayer,MakesBadMusic\n"; String\[\] csvUsers = sensitiveData.split("\n"); for (String csvUser : csvUsers) { Â Â Â Â Â Â Â Â Â String[] csvUserSplit = csvUser.split(","); Â Â Â Â Â Â Â Â Â String name = csvUserSplit[0]; Â Â Â Â Â Â Â Â Â String pw = csvUserSplit[1]; Â Â Â Â Â Â Â Â Â User u = new User(name, pw); Â Â Â Â Â Â Â Â Â userMap.put(name, u); Â Â Â Â Â Â } regex = args[0]; flags = 0; System.out.println("Pattern: \'" + regex + "\'"); p = Pattern.compile(regex, flags); for (String u : userMap.keySet()) { Â Â Â Â Â Â Â Â Â m = p.matcher(u); Â Â Â Â Â Â Â Â Â while (m.find()) Â Â Â Â Â Â Â Â Â Â Â Â System.out.println("Found \'" + m.group() + "\'"); Â Â Â Â Â Â } System.err.println("DONE"); } } |
Risk Assessment
Rule | Severity | Liklihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
IDS18-J | medium | probable | high | P8 | L2 |
References
CWE ID 625 Permissive Regular Expressions
{
public static void FindLogEntry(String search) {
// Construct regex dynamically from user string
String regex = "(.*? +public\\[\\d+\\] +.*" + search + ".*)";
Pattern searchPattern = Pattern.compile(regex);
try (FileInputStream fis = new FileInputStream("log.txt")) {
FileChannel channel = fis.getChannel();
// Get the file's size and map it into memory
long size = channel.size();
final MappedByteBuffer mappedBuffer = channel.map(
FileChannel.MapMode.READ_ONLY, 0, size);
Charset charset = Charset.forName("ISO-8859-15");
final CharsetDecoder decoder = charset.newDecoder();
// Read file into char buffer
CharBuffer log = decoder.decode(mappedBuffer);
Matcher logMatcher = searchPattern.matcher(log);
while (logMatcher.find()) {
String match = logMatcher.group();
if (!match.isEmpty()) {
System.out.println(match);
}
}
} catch (IOException ex) {
System.err.println("thrown exception: " + ex.toString());
Throwable[] suppressed = ex.getSuppressed();
for (int i = 0; i < suppressed.length; i++) {
System.err.println("suppressed exception: "
+ suppressed[i].toString());
}
}
return;
}
|
This code permits an attacker to perform a regex injection.
Compliant Solution (Whitelisting)
This compliant solution sanitizes the search terms at the beginning of the FindLogEntry()
, filtering out nonalphanumeric characters (except space and single quote):
Code Block | ||
---|---|---|
| ||
public static void FindLogEntry(String search) {
// Sanitize search string
StringBuilder sb = new StringBuilder(search.length());
for (int i = 0; i < search.length(); ++i) {
char ch = search.charAt(i);
if (Character.isLetterOrDigit(ch) || ch == ' ' || ch == '\'') {
sb.append(ch);
}
}
search = sb.toString();
// Construct regex dynamically from user string
String regex = "(.*? +public\\[\\d+\\] +.*" + search + ".*)";
// ...
}
|
This solution prevents regex injection but also restricts search terms. For example, a user may no longer search for "name =
" because nonalphanumeric characters are removed from the search term.
Compliant Solution (Pattern.quote()
)
This compliant solution sanitizes the search terms by using Pattern.quote()
to escape any malicious characters in the search string. Unlike the previous compliant solution, a search string using punctuation characters, such as "name =" is permitted.
Code Block | ||
---|---|---|
| ||
public static void FindLogEntry(String search) {
// Sanitize search string
search = Pattern.quote(search);
// Construct regex dynamically from user string
String regex = "(.*? +public\\[\\d+\\] +.*" + search + ".*)";
// ...
}
|
The Matcher.quoteReplacement()
method can be used to escape strings used when doing regex substitution.
Compliant Solution
Another method of mitigating this vulnerability is to filter out the sensitive information prior to matching. Such a solution would require the filtering to be done every time the log file is periodically refreshed, incurring extra complexity and a performance penalty. Sensitive information may still be exposed if the log format changes but the class is not also refactored to accommodate these changes.
Risk Assessment
Failing to sanitize untrusted data included as part of a regular expression can result in the disclosure of sensitive information.
Rule | Severity | Likelihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
IDS08-J | Medium | Unlikely | Medium | P4 | L3 |
Automated Detection
Tool | Version | Checker | Description | ||||||
---|---|---|---|---|---|---|---|---|---|
The Checker Framework |
| Tainting Checker | Trust and security errors (see Chapter 8) | ||||||
CodeSonar |
| JAVA.IO.TAINT.REGEX | Tainted Regular Expression (Java) | ||||||
SonarQube |
| Regular expressions should not be vulnerable to Denial of Service attacks |
Related Guidelines
Bibliography
...
CVE-2005-1949 Arbitrary command execution in ePing plugin for e107 portal due to an overly permissive regular expression parsing an IP