Java's regular expression facilities are wide ranging and powerful which can lead to unwanted modification of the original regular expression string to form a pattern that matches too widely, possibly resulting in far too much information being matched.
The primary means of preventing this vulnerability is to sanitize a regular expression string coming from untrusted input. Additionally, the programmer should look into ways of avoiding using regular expressions from untrusted input, or perhaps provide only a very limited subset of regular expression functionality to the user
Constructs and properties of Java regular expressions to watch out for include:
- match flags used in non-capturing groups (These override matching options that may or may not have been passed into the compile() method.
- Greediness
Since Java regular expressions are similar to Perl, it is a good idea to apply lessons learned from Perl regex.
Noncompliant Code Example
This class does not sanitize the incoming regular expression, and as a result, exposes too much information to the user.
This program searches a database of users for usernames that match a regular expression. A non-malicious example would be to search for 'John.'. A malicious example would be to search for '(?s)John.'
Code Block |
---|
import java.util.regex.Pattern; import java.util.regex.Matcher; /* Usage Test1 <regex>  * Regex is used directly without santization causing sensitive data to be exposed  *  * Imagine this program searches a database of users for usernames that match a regex  * Non malicious usage: Test1 John.*  * Malicious usage: (?s)John.*  */ public class Test1 {    public static void main(String[] args)    {       if (args.length < 1) {          System.err.println("Failed to specify a regex");          return;       }       String sensitiveData; //represents sensitive data from a file or something       int flags;       String regex;       Pattern p;       Matcher m;       //imagine a CSV style database: user,password       sensitiveData = "JohnPaul,HearsGodsVoice\nJohnJackson,OlympicBobsleder\nJohnMayer,MakesBadMusic\n";       String  regexregex = args[0];       //regex = "(?s)John.*";       flags = 0;       regex += ","; //supposedly this forces the regex to only match names       System.out.println("Pattern: \'" + regex + "\'");       Pattern  pp = Pattern.compile(regex, flags0);       m Matcher m = p.matcher(sensitiveData);       while (m.find())          System.out.println("Found \'" + m.group() + "\'");       System.err.println("DONE");    } } |
When searching using the regex '(?s)John.*', the program returns all the users' passwords. The (?s) turns on single-line matching support, which means new lines are ignored.
Compliant Solution
It is very difficult to filter out bad regular expressions. It might be easier and more secure to rewrite the application to limit the usage of regular expressions.
For the above code sample, the easy solution is to parse the CSV into a class.
Risk Assessment
Rule | Severity | Liklihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
IDS18-J | medium | unlikely | high |
|
|
References
CWE ID 625 Permissive Regular Expressions