Java's regular expression facilities are wide ranging and powerful which can lead to unwanted modification of the original regular expression string to form a pattern that matches too widely, possibly resulting in far too much information being matched.
The primary means of preventing this vulnerability is to sanitize a regular expression string coming from untrusted input. Additionally, the programmer should look into ways of avoiding using regular expressions from untrusted input, or perhaps provide only a very limited subset of regular expression functionality to the user
Constructs and properties of Java regular expressions to watch out for include:
- match flags used in non-capturing groups (These override matching options that may or may not have been passed into the compile() method.)
- greediness
- grouping
Since Java regular expressions are similar to Perl, it is a good idea to apply lessons learned from Perl regex.
Noncompliant Code Example
This program searches a log file of previous searches for keywords that match a regular expression to present search suggestions to the user.
This class does not sanitize the incoming regular expression, and as a result, exposes too much information from the log file to the user.
A non-malicious use would be to enter "C" to match Charles and Cecilia. A malicious use would be to enter "?:)(^C.*,[0-9]+?,[0-9]+?$)|(?:" which grabs the IPs that made the search.
The outer parentheses of the malicious search string defeat the grouping protection. Using the OR operator allows injection of any arbitrary regex. Now this use will reveal all times and IPs the keyword 'C' was searched.
import java.util.HashSet; import java.util.Set; import java.util.regex.Matcher; import java.util.regex.Pattern; public class ExploitableLog {    //A buffer containing the log.    private StringBuffer logBuffer;       /* an application repeatedly calls this function that searches through the    * search log for search suggestions for autocompletion    */    public Set<String> suggestSearches(String search)    {       Set<String> searches = new HashSet<String>();             /* Construct regex from user string */       //Regex matches full valid log lines. The grouping characters will limit       //the returned string to only the keyword.       String regex = "^(" + search + ".*),[0-9]+?,[0-9]+?$";       int flags = Pattern.MULTILINE; //needed since matching on a String                               //containing newlines       Pattern p = Pattern.compile(regex, flags);             /* Read from log */       int len = logBuffer.length(); //the SB can only be >= to this number       String s = logBuffer.substring(0, len);       logBuffer.delete(0, len);             /* Match regex */       Matcher m = p.matcher(s);       while (m.find()) {          String found = m.group(1);          searches.add(found);       }             return searches;    }       public ExploitableLog()    {       //this is supposed to come from a file, but its here as a string for       //illustrative purposes       logBuffer = new StringBuffer();       logBuffer.append("Alice,1267773881,2147651408\n");       logBuffer.append("Bono,1267774881,2147351708\n");       logBuffer.append("Charles,1267775881,1175523058\n");       logBuffer.append("Cecilia,1267773222,291232332\n");    } }
Compliant Solution
Solutions include parsing the CSV into a class prior to matching or whitelisting only certain characters (such as letters and digits). Blacklisting might be difficult due to the variability of the regex language.
This solution filters out non-alphanumeric characters from the search string using Java's Character.isLetterOrDigit(). This removes the grouping parentheses and the OR operator which triggers the injection.
import java.util.HashSet; import java.util.Set; import java.util.regex.Matcher; import java.util.regex.Pattern; public class FilteredLog {    //A buffer containing the log.    private StringBuffer logBuffer;       /* an application repeatedly calls this function that searches through the    * search log for search suggestions for autocompletion    */    public Set<String> suggestSearches(String search)    {       Set<String> searches = new HashSet<String>();             /* Filter user input */       StringBuilder sb = new StringBuilder(search.length());       for (int i = 0; i < search.length(); ++i) {          char ch = search.charAt(i);          if (Character.isLetterOrDigit(ch))             sb.append(ch);       }       search = sb.toString();             /* Construct regex from user string */       //Regex matches full valid log lines. The grouping characters will limit       //the returned string to only the keyword.       String regex = "^(" + search + ".*),[0-9]+?,[0-9]+?$";       int flags = Pattern.MULTILINE; //needed since matching on a String                               //containing newlines       Pattern p = Pattern.compile(regex, flags);             /* Read from log */       int len = logBuffer.length(); //the SB can only be >= to this number       String s = logBuffer.substring(0, len);       logBuffer.delete(0, len);             /* Match regex */       Matcher m = p.matcher(s);       while (m.find()) {          String found = m.group(1);          searches.add(found);       }             return searches;    }       public FilteredLog()    {       //this is supposed to come from a file, but its here as a string for       //illustrative purposes       logBuffer = new StringBuffer();       logBuffer.append("Alice,1267773881,2147651408\n");       logBuffer.append("Bono,1267774881,2147351708\n");       logBuffer.append("Charles,1267775881,1175523058\n");       logBuffer.append("Cecilia,1267773222,291232332\n");    } }
Risk Assessment
Rule |
Severity |
Liklihood |
Remediation Cost |
Priority |
Level |
---|---|---|---|---|---|
IDS18-J |
medium |
probable |
high |
P8 |
L2 |
References
CWE ID 625 Permissive Regular Expressions
CVE-2005-1949 Arbitrary command execution in ePing plugin for e107 portal due to an overly permissive regular expression parsing an IP