Regular expressions (regexregexes) are commonly used to match strings of text. As such, regular expressions can be found in applications that must search through text. A notable example includes the *NIX POSIX grep
utility. For example, a programmer may want this kind of functionality for searching through log files.
Java's regular expression facilities are wide ranging and powerful which can lead to unwanted modification of the original regular expression string to form a pattern that matches too widely, possibly resulting in far too much information being matched, or matches occuring when not expected.
One method of preventing this vulnerability is to parse filter out the sensitive information prior to matching and then running the user-supplied regex against thatthe remaining non-sensitive information. However, if the log format changes without a corresponding change in the class, sensitive information may be exposed. Furthermore, depending on how encapsulated the search keywords are, a malicious user may be able to grab a list of all the keywords. (If there are a lot of keywords, this may cause a denial of service.).
The primary means of preventing this vulnerability is to sanitize a regular expression string coming from untrusted input. One may whitelist Whitelisting certain characters (such as letters and digits) before passing the user supplied string to the regex parser is a common strategy. Blacklisting certain operators might be difficult due to the variability of the regex language, and consequently whitelisting is preferred over blacklisting.
Additionally, the programmer could look into ways of avoiding using regular expressions from untrusted input, or perhaps provide only a very limited subset of regular expression functionality to the user.
Constructs and properties of Java regular expressions to watch out for include:
- match Matching flags used in non-capturing groups (These override matching options that may or may not have been passed into the
compile()
method.) - greediness (where the The regular expression tries to match as much of the string as possible, which . This may expose too much information.)
- grouping (where the The programmer can define certain smaller parts of the regular expression to capture and return, but a malicious user may be able to use to make his their own groupings)
Wiki Markup |
---|
For introductory information on regular expressions, see \[[Tutorials 08|AA. Java References#Tutorials 08]\]. |
...
This noncompliant code example searches a log file of previous searches for keywords that match a regular expression to present search suggestions to the user. The function suggestSearches()
method is repeatedly called to bring up provide suggestions for the user for auto- completion of the search text. The full log of previous searches is stored in the logBuffer StringBuilder
object. The strings in logBuffer
are periodically copied to the log String
object for use in searchSuggestions()
.
Code Block | ||
---|---|---|
| ||
import java.util.HashSet; import java.util.Set; import java.util.regex.Matcher; import java.util.regex.Pattern; public final class ExploitableLog {  privateprivate static final StringBuilder logBuffer = new StringBuilder();  private private static String log = logBuffer.toString(); static { // this is supposed to come from a file, but its here as a string for // illustrative purposes. Each line's format is: name,id,timestamp append("Alice,1267773881,2147651408\n"); append("Bono,1267774881,2147351708\n"); append("Charles,1267775881,1175523058\n"); append("Cecilia,1267773222,291232332\n"); }      private private static void append(CharSequence str) {    logBufferlogBuffer.append(str);    loglog = logBuffer.toString(); // update log string on append  } public static Set<String> suggestSearches(String search) { Set<String> searches = new HashSet<String>(); // Construct regex from user string String regex = "^(" + search + ".*),[0-9]+?,[0-9]+?$"; int flags = Pattern.MULTILINE; Pattern keywordPattern = Pattern.compile(regex, flags); // Match regex Matcher logMatcher = keywordPattern.matcher(log); while (logMatcher.find()) { String found = logMatcher.group(1); searches.add(found); } return searches; } } |
The regex used to search the log is:
No Format |
---|
^^^(" + search + ".*),[0-9]+?,[0-9]+?$" |
This regex matches against an entire line of the log and searches for old searches beginning with the entered keyword. The anchoring operators and use of the reluctance operators mitigate some greediness concerns. The grouping characters allow the program to grab only the keyword while still matching the IP and timestamp. Because the log String
contains multiple lines, the MULTILINE
flag must be active to force the anchoring operators to match against newlines. By all appearances, this is a strong regex.
...
which grabs the entire log line rather than just the old keywords. The outer first close parentheses of the malicious search string defeat defeats the grouping protection. Using the OR operator allows injection of any arbitrary regex. Now this use regex will reveal all times IPs and IPs timestamps of past searches.
Compliant Solution
...
Code Block | ||
---|---|---|
| ||
import java.util.HashSet; import java.util.Set; import java.util.regex.Matcher; import java.util.regex.Pattern; public final class FilteredLog {  private static final StringBuilder logBuffer = new StringBuilder();  private static String log = logBuffer.toString();    static { // this is supposed to come from a file, but its here as a string for // illustrative purposes append("Alice,1267773881,2147651408\n"); append("Bono,1267774881,2147351708\n"); append("Charles,1267775881,1175523058\n"); append("Cecilia,1267773222,291232332\n"); }     private static void append(CharSequence str) {    logBuffer.append(str);    log = logBuffer.toString(); //update log string on append  }  public// ... public static Set<String> suggestSearches(String search) { Set<String> searches = new HashSet<String>(); // Filter bad chars from user input StringBuilder sb = new StringBuilder(search.length()); for (int i = 0; i < search.length(); ++i) { char ch = search.charAt(i); if (Character.isLetterOrDigit(ch) || ch == ' ' || ch == '\'') { sb.append(ch); } } search = sb.toString(); // Construct regex from user string String regex = "^(" + search + ".*),[0-9]+?,[0-9]+?$"; int flags = Pattern.MULTILINE; Pattern keywordPattern = Pattern.compile(regex, flags); // Match regex Matcher logMatcher = keywordPattern.matcher(log); while (logMatcher.find()) { String found = logMatcher.group(1); searches.add(found); } return searches; } } |
Risk Assessment
Violating this guideline may result in sensitive information disclosure.
Rule | Severity | Liklihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
IDS18-J | medium | probable | high | P8 | L2 |
...
References
Wiki Markup |
---|
\[[Tutorials 08|AA. Java References#Tutorials 08]\] [Regular Expressions|http://java.sun.com/docs/books/tutorial/essential/regex/index.html] \[[MITRE 09|AA. Java References#MITRE 09]\] [CWE ID 625|http://cwe.mitre.org/data/definitions/625.html] "Permissive Regular Expressions" \[[CVE 05|AA. Java References#CVE]\] [CVE-2005-1949|http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2005-1949] |