Regular expressions are widely used to match strings of text. For example, the POSIX grep
utility supports regular expressions for finding patterns in the specified text. For introductory information on regular expressions, see the Java Tutorials [[Tutorials 08]]. The java.util.regex
package provides the Pattern
class that encapsulates a compiled representation of a regular expression and the Matcher
class that is an engine which interprets and uses a Pattern
to perform matching operations on a CharacterSequence
.
The powerful regular expression (regex) facilities must be protected from misuse. An attacker may supply a malicious input that modifies the original regular expression in such a way that the regex fails to comply with the program's specification. This attack vector, referred to as a regex injection, might affect control flow, cause information leaks, or result in denial of service vulnerabilities (DoS).
Certain constructs and properties of Java regular expressions are susceptible to exploitation:
- Matching flags: Untrusted inputs may override matching options that may or may not have been passed to the
Pattern.compile()
method. - Greediness: An untrusted input may attempt to inject a regex that changes the original regex to match as much of the string as possible, exposing sensitive information.
- Grouping: The programmer can enclose parts of a regular expression in parentheses to perform some common action on the group. An attacker may be able to change the groupings by supplying untrusted input, leading to the security weaknesses described earlier.
Untrusted input should be sanitized before use to prevent regex injection. When the user must specify a regex as input, care must be taken to ensure that the original regex cannot be modified without restriction. White-listing characters (such as letters and digits) before delivering the user supplied string to the regex parser is a good input validation strategy. However, when the user is allowed to enter regexes, the white-list may need to permit certain dangerous characters. These inputs should not be used to build a security sensitive dynamic regex. A programmer must provide only a very limited subset of regular expression functionality to the user to minimize any chance of misuse.
Noncompliant Code Example
This noncompliant code example periodically loads a log file into memory and allows clients to obtain keyword search suggestions by passing the keyword as an argument to suggestSearches()
.
public class Keywords { private static ScheduledExecutorService scheduler = Executors .newSingleThreadScheduledExecutor(); private static CharBuffer log; private static final Object lock = new Object(); public static Set<String> suggestSearches(String search) { synchronized(lock) { Set<String> searches = new HashSet<String>(); // Construct regex dynamically from user string String regex = "(" + search + ".*),[\\d]+?,[\\d]+?"; Pattern keywordPattern = Pattern.compile(regex); Matcher logMatcher = keywordPattern.matcher(log); while (logMatcher.find()) { String found = logMatcher.group(1); searches.add(found); } return searches; } } static { try { FileChannel channel = new FileInputStream( "path").getChannel(); // Get the file's size and map it into memory int size = (int) channel.size(); final MappedByteBuffer mappedBuffer = channel.map( FileChannel.MapMode.READ_ONLY, 0, size); Charset charset = Charset.forName("ISO-8859-15"); final CharsetDecoder decoder = charset.newDecoder(); log = decoder.decode(mappedBuffer); // Read file into char buffer Runnable periodicLogRead = new Runnable() { @Override public void run() { synchronized(lock) { try { log = decoder.decode(mappedBuffer); } catch (CharacterCodingException e) { // Forward to handler } } } }; scheduler.scheduleAtFixedRate(periodicLogRead, 0, 5, TimeUnit.SECONDS); } catch (Throwable t) { // Forward to handler } } }
The log file is parsed using a regex constructed dynamically from user input (search
):
"(" + search + ".*),[\\d]+?,[\\d]+?"
This regex finds lines in the log that correspond to the value of search
. Consequently, if an attacker can perform regex injection, sensitive information such as the IP Address of the user may be disclosed.
A trusted user might enter the "C" keyword to match "Charles" and "Cecilia", however, a malicious user could enter the string .*)|(
. This might result in unintended disclosure of sensitive information present in the log. Using the OR (|
) operator allows injection of any arbitrary regex.
One method of mitigating this vulnerability is to filter out the sensitive information prior to matching. However, sensitive information may be exposed if the log format changes but the class is not refactored to accommodate the changes.
Compliant Solution
This compliant solution filters out non-alphanumeric characters (except space and single quote) from the search string, which prevents regex injection.
public class Keywords { // ... public static Set<String> suggestSearches(String search) { synchronized(lock) { Set<String> searches = new HashSet<String>(); StringBuilder sb = new StringBuilder(search.length()); for (int i = 0; i < search.length(); ++i) { char ch = search.charAt(i); if (Character.isLetterOrDigit(ch) || ch == ' ' || ch == '\'') { sb.append(ch); } } search = sb.toString(); // Construct regex dynamically from user string String regex = "(" + search + ".*),[\\d]+?,[\\d]+?"; // ... } } // ... }
Risk Assessment
Violating this guideline may result in the disclosure of sensitive information.
Rule |
Severity |
Liklihood |
Remediation Cost |
Priority |
Level |
---|---|---|---|---|---|
IDS18- J |
medium |
unlikely |
medium |
P4 |
L3 |
References
[[Tutorials 08]] Regular Expressions
[[MITRE 09]] CWE ID 625 "Permissive Regular Expressions"
[[CVE 05]] CVE-2005-1949