Java's regular expression facilities are wide ranging and powerful which can lead to unwanted modification of the original regular expression string to form a pattern that matches too widely, possibly resulting in far too much information being matched.
The primary means of preventing this vulnerability is to sanitize a regular expression string coming from untrusted input. Additionally, the programmer should look into ways of avoiding using regular expressions from untrusted input, or perhaps provide only a very limited subset of regular expression functionality to the user
Constructs and properties of Java regular expressions to watch out for include:
- match flags used in non-capturing groups (These override matching options that may or may not have been passed into the compile() method.
- Greediness
Since Java regular expressions are similar to Perl, it is a good idea to apply lessons learned from Perl regex.
Noncompliant Code Example
This class does not sanitize the incoming regular expression, and as a result, exposes too much information to the user.
This program searches a database of users for searches that match a regular expressions to present search suggestions to the user.
A non-malicious use would be to enter "Bono". A malicious use would be to enter "?:)(^Bono,[0-9]+?,[0-9]+?$)|(?:".
The outer parentheses defeat the grouping protection. Using the OR operator allows injection of any arbitrary regex. Now this use will reveal all times and IPs the keyword 'Bono' was searched.
/* Say this logfile contains:  * CSV style: search string, time (unix), ip (integer)  *  * Alice,1267773881,2147651708  * Bono,1267774881,2147651708  * Charles,1267775881,1175563058  *  * and the CSVLog class has a readLine() method which retrieves a single line from the CSVLog and returns null when at EOF  */ private CSVLog logfile;  //an application repeatedly calls this function that searches through the search log for search suggestions for autocompletion public Set<String> suggestSearches(String search) {   Set<String> searches = new HashSet<String>();      //construct regex from user's string //the regex matches valid lines and the grouping characters will limit the returned regex to the search string   String regex = "^(" + search + "),[0-9]+?,[0-9]+?$";   Pattern p = Pattern.compile(regex);   String s;   while ((s = logfile.readLine()) != null) { //gets a single line from the logfile       Matcher m = p.matcher(s);       if (m.find()) {           String found = m.group(1);           searches.add(found);       }   }          return searches; }
When searching using the regex '(?s)John.*', the program returns all the users' passwords. The (?s) turns on single-line matching support, which means new lines are ignored.
Compliant Solution
It is very difficult to filter out overly permissive regular expressions. It might be easier and more secure to rewrite the application to limit the usage of regular expressions.
For the above code sample, the easy solution is to parse the CSV into a class and limit the regular expression over the name field of the User class.
import java.util.regex.Pattern; import java.util.regex.Matcher; import java.util.HashMap; /\* Usage Test2 <regex> \* Regex is used directly without santization causing sensitive data to be exposed \* \* Imagine this program searches a database of users for usernames that match a regex \* Non malicious usage: Test1 John.\* \* Malicious usage: (?s)John.\* */ public class Test2 { public static class User { String name, password; public User(String name, String password) { Â Â Â Â Â Â Â Â Â setName(name); Â Â Â Â Â Â Â Â Â setPassword(password); Â Â Â Â Â Â } private void setName(String n) { name = n; } private void setPassword(String pw) { password = pw; } public String getName() { return name; } } public static void main(String\[\] args) { if (args.length < 1) { Â Â Â Â Â Â Â Â Â System.err.println("Failed to specify a regex"); Â Â Â Â Â Â Â Â Â return; Â Â Â Â Â Â } String sensitiveData; //represents sensitive data from a file or something int flags; String regex; Pattern p; Matcher m; HashMap<String, User> userMap = new HashMap<String, User>(); //imagine a CSV style database: user,password sensitiveData = "JohnPaul,HearsGodsVoice\nJohnJackson,OlympicBobsleder\nJohnMayer,MakesBadMusic\n"; String\[\] csvUsers = sensitiveData.split("\n"); for (String csvUser : csvUsers) { Â Â Â Â Â Â Â Â Â String[] csvUserSplit = csvUser.split(","); Â Â Â Â Â Â Â Â Â String name = csvUserSplit[0]; Â Â Â Â Â Â Â Â Â String pw = csvUserSplit[1]; Â Â Â Â Â Â Â Â Â User u = new User(name, pw); Â Â Â Â Â Â Â Â Â userMap.put(name, u); Â Â Â Â Â Â } regex = args[0]; flags = 0; System.out.println("Pattern: \'" + regex + "\'"); p = Pattern.compile(regex, flags); for (String u : userMap.keySet()) { Â Â Â Â Â Â Â Â Â m = p.matcher(u); Â Â Â Â Â Â Â Â Â while (m.find()) Â Â Â Â Â Â Â Â Â Â Â Â System.out.println("Found \'" + m.group() + "\'"); Â Â Â Â Â Â } System.err.println("DONE"); } }
Risk Assessment
Rule |
Severity |
Liklihood |
Remediation Cost |
Priority |
Level |
---|---|---|---|---|---|
IDS18-J |
medium |
unlikely |
high |
|
|
References
CWE ID 625 Permissive Regular Expressions
CVE-2005-1949 Arbitrary command execution in ePing plugin for e107 portal due to an overly permissive regular expression parsing an IP