...
- match flags used in non-capturing groups (These override matching options that may or may not have been passed into the
compile()
method.) - greediness (where the regular expression tries to match as much of the string as possible, which may expose too much information)
- grouping (where the programmer can define certain smaller parts of the regular expression to capture and return, but a malicious user may be able to use to make his own groupings)
For introductory information on regular expressions, see Wikipedia
Noncompliant Code Example
This noncompliant code example searches a log file of previous searches for keywords that match a regular expression to present search suggestions to the user. The function suggestSearches()
is repeatedly called to bring up suggestions for the user for auto-completion. The full log of previous searches is stored in the logBuffer StringBuilder
object. The strings in logBuffer
are periodically copied to the log String
object for use in searchSuggestions()
.
The regex used to search the log is:
No Format |
---|
^(" + search + ".*),[0-9]+?,[0-9]+?$
|
This regex matches against an entire line of the log and searches for old searches beginning with the entered keyword. The anchoring operators and use of the reluctance operators mitigate some greediness concerns. The grouping characters allow the program to grab only the keyword while still matching the IP and timestamp. Because the log String
contains multiple lines, the MULTILINE
flag must be active to force the anchoring operators to match against newlines. By all appearances, this is a strong regex.
However, this class does not sanitize the incoming regular expression, and as a result, exposes too much information from the log file to the user.
A non-malicious use of the searchSuggestions()
method would be to enter "C" to match "Charles" and "Cecilia". However, a malicious user could enter
...
(
...
)
...
which grabs the entire log line rather than just the old keywords. The outer parentheses of the malicious search string defeat the grouping protection. Using the OR operator allows injection of any arbitrary regex. Now this use will reveal all times and IPs of past searches.
Code Block | ||
---|---|---|
| ||
import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public final class ExploitableLog {
   private static final StringBuilder logBuffer = new StringBuilder();
   private static String log = logBuffer.toString();
  Â
   public static Set<String> suggestSearches(String search) {
      Set<String> searches = new HashSet<String>();
     Â
      // Construct regex from user string
      String regex = "^(" + search + ".*),[0-9]+?,[0-9]+?$";
      int flags = Pattern.MULTILINE;
      Pattern keywordPattern = Pattern.compile(regex, flags);
     Â
      // Match regex
      Matcher logMatcher = keywordPattern.matcher(log);
      while (logMatcher.find()) {
         String found = logMatcher.group(1);
         searches.add(found);
      }
     Â
      return searches;
   }
  Â
   private static void append(CharSequence str) {
      logBuffer.append(str);
      log = logBuffer.toString(); //update log string on append
   }
   static {
      // this is supposed to come from a file, but its here as a string for
      // illustrative purposes
      append("Alice,1267773881,2147651408\n");
      append("Bono,1267774881,2147351708\n");
      append("Charles,1267775881,1175523058\n");
      append("Cecilia,1267773222,291232332\n");
   }
}
|
The regex used to search the log is:
No Format |
---|
^(" + search + ".*),[0-9]+?,[0-9]+?$
|
This regex matches against an entire line of the log and searches for old searches beginning with the entered keyword. The anchoring operators and use of the reluctance operators mitigate some greediness concerns. The grouping characters allow the program to grab only the keyword while still matching the IP and timestamp. Because the log String
contains multiple lines, the MULTILINE
flag must be active to force the anchoring operators to match against newlines. By all appearances, this is a strong regex.
However, this class does not sanitize the incoming regular expression, and as a result, exposes too much information from the log file to the user.
A non-malicious use of the searchSuggestions()
method would be to enter "C" to match "Charles" and "Cecilia". However, a malicious user could enter
No Format |
---|
?:)(^.*,[0-9]+?,[0-9]+?$)|(?:
|
which grabs the entire log line rather than just the old keywords. The outer parentheses of the malicious search string defeat the grouping protection. Using the OR operator allows injection of any arbitrary regex. Now this use will reveal all times and IPs of past searches.
Compliant Solution
This compliant solution filters out non-alphanumeric characters from the search string using Java's Character.isLetterOrDigit()
. This removes the grouping parentheses and the OR operator which triggers the injection.
...