Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Regular expressions (regex) are commonly widely used to match strings of text. As such, regular expressions can be found in applications that search through text. A notable example includes the *NIX grep utility. For example, a programmer may want this kind of functionality for searching through log filesFor example, the POSIX grep utility supports regular expressions for finding patterns in the specified text. For introductory information on regular expressions, see the Java Tutorials [Java Tutorials]. The java.util.regex package provides the Pattern class that encapsulates a compiled representation of a regular expression and the Matcher class, which is an engine that uses a Pattern to perform matching operations on a CharSequence.

Java's regular expression facilities are wide ranging and powerful which can lead to unwanted modification of powerful regex facilities must be protected from misuse. An attacker may supply a malicious input that modifies the original regular expression string to form a pattern that matches too widely, possibly resulting in far too much information being matched.

One method of preventing this vulnerability is to parse out the sensitive information prior to matching and then running the user-supplied regex against that. However, if the log format changes without a corresponding change in the class, sensitive information may be exposed. Furthermore, depending on how encapsulated the search keywords are, a malicious user may be able to grab a list of all the keywords (If there are a lot of keywords, this may cause a denial of service).

in such a way that the regex fails to comply with the program's specification. This attack vector, called a regex injection, might affect control flow, cause information leaks, or result in denial-of-service (DoS) vulnerabilities.

Certain constructs and properties of Java regular expressions are susceptible to exploitation:

  • Matching flags: Untrusted inputs may override matching options that may or may not have been passed to the Pattern.compile() method.
  • Greediness: An untrusted input may attempt to inject a regex that changes the original regex to match as much of the string as possible, exposing sensitive information.
  • Grouping: The programmer can enclose parts of a regular expression in parentheses to perform some common action on the group. An attacker may be able to change the groupings by supplying untrusted input.

Untrusted input should be sanitized before use to prevent regex injection. When the user must specify a regex as input, care must be taken to ensure that the original regex cannot be modified without restriction. Whitelisting The primary means of preventing this vulnerability is to sanitize a regular expression string coming from untrusted input. One may whitelist certain characters (such as letters and digits) before passing delivering the user-supplied string to the regex parser . Blacklisting certain operators might be difficult due to the variability of the regex language.Additionally, the programmer could look into ways of avoiding using regular expressions from untrusted input, or perhaps is a good input sanitization strategy. A programmer must provide only a very limited subset of regular expression functionality to the user to minimize any chance of misuse.

Constructs and properties of Java regular expressions to watch out for include:

  • match flags used in non-capturing groups (These override matching options that may or may not have been passed into the compile() method.)
  • greediness (where the regular expression tries to match as much of the string as possible, which may expose too much information)
  • grouping (where the programmer can define certain smaller parts of the regular expression to capture and return, but a malicious user may be able to use to make his own groupings)

Regex Injection Example

Suppose a system log file contains messages output by various system processes. Some processes produce public messages, and some processes produce sensitive messages marked "private." Here is an example log file:

Code Block
10:47:03 private[423] Successful logout  name: usr1 ssn: 111223333
10:47:04 public[48964] Failed to resolve network service
10:47:04 public[1] (public.message[49367]) Exited with exit code: 255
10:47:43 private[423] Successful login  name: usr2 ssn: 444556666
10:48:08 public[48964] Backup failed with error: 19

A user wishes to search the log file for interesting messages but must be prevented from seeing the private messages. A program might accomplish this by permitting the user to provide search text that becomes part of the following regex:

Code Block
(.*? +public\[\d+\] +.*<SEARCHTEXT>.*)

However, if an attacker can substitute any string for <SEARCHTEXT>, he can perform a regex injection with the following text:

Code Block
.*)|(.*

When injected into the regex, the regex becomes

Code Block
(.*? +public\[\d+\] +.*.*)|(.*.*)

This regex will match any line in the log file, including the private ones.For introductory information on regular expressions, see Wikipedia

Noncompliant Code Example

This noncompliant code example searches a log file of previous searches for keywords that match a regular expression to present search suggestions to the user. The function suggestSearches() is repeatedly called to bring up suggestions for the user for auto-completion. The full log of previous searches is stored in the logBuffer StringBuilder object. The strings in logBuffer are periodically copied to the log String object for use in searchSuggestions().using search terms from an untrusted user:

Code Block
bgColor#FFCCCC
import java.io.FileInputStream;
import java.util.HashSetio.IOException;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.utilcharset.SetCharsetDecoder;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public final class ExploitableLogLogSearch {
  	public  private static final StringBuilder logBuffer = new StringBuilder();
    private static String log = logBuffer.toString();

    static {
        // this is supposed to come from a file, but its here as a string for
        // illustrative purposes
        append("Alice,1267773881,2147651408\n");
        append("Bono,1267774881,2147351708\n");
        append("Charles,1267775881,1175523058\n");
        append("Cecilia,1267773222,291232332\n");
    }
      
    private static void append(CharSequence str) {
        logBuffer.append(str);
        log = logBuffer.toString(); //update log string on append
    }

    public static Set<String> suggestSearches(String search) {
        Set<String> searches = new HashSet<String>();
        
        // Construct regex from user string
        String regex = "^(" + search + ".*),[0-9]+?,[0-9]+?$";
        int flags = Pattern.MULTILINE;
        Pattern keywordPattern = Pattern.compile(regex, flags);
        
        // Match regex
        Matcher logMatcher = keywordPattern.matcher(log);
        void FindLogEntry(String search) {
		// Construct regex dynamically from user string
		String regex = "(.*? +public\\[\\d+\\] +.*" + search + ".*)";
		Pattern searchPattern = Pattern.compile(regex);
		try (FileInputStream fis = new FileInputStream("log.txt")) {
			FileChannel channel = fis.getChannel();
			// Get the file's size and map it into memory
			long size = channel.size();
			final MappedByteBuffer mappedBuffer = channel.map(
					FileChannel.MapMode.READ_ONLY, 0, size);
			Charset charset = Charset.forName("ISO-8859-15");
			final CharsetDecoder decoder = charset.newDecoder();
			// Read file into char buffer
			CharBuffer log = decoder.decode(mappedBuffer);
			Matcher logMatcher = searchPattern.matcher(log);
			while (logMatcher.find()) {
            String found				String match = logMatcher.group(1);
				if            searches.add(found);
        }
        
        return searches;
    }
}

The regex used to search the log is:

No Format

^(" + search + ".*),[0-9]+?,[0-9]+?$

This regex matches against an entire line of the log and searches for old searches beginning with the entered keyword. The anchoring operators and use of the reluctance operators mitigate some greediness concerns. The grouping characters allow the program to grab only the keyword while still matching the IP and timestamp. Because the log String contains multiple lines, the MULTILINE flag must be active to force the anchoring operators to match against newlines. By all appearances, this is a strong regex.

However, this class does not sanitize the incoming regular expression, and as a result, exposes too much information from the log file to the user.

A non-malicious use of the searchSuggestions() method would be to enter "C" to match "Charles" and "Cecilia". However, a malicious user could enter

No Format

 ?:)(^.*,[0-9]+?,[0-9]+?$)|(?:

which grabs the entire log line rather than just the old keywords. The outer parentheses of the malicious search string defeat the grouping protection. Using the OR operator allows injection of any arbitrary regex. Now this use will reveal all times and IPs of past searches.

Compliant Solution

This compliant solution filters out non-alphanumeric characters from the search string using Java's Character.isLetterOrDigit(). This removes the grouping parentheses and the OR operator which triggers the injection.

(!match.isEmpty()) {
					System.out.println(match);
				}
			}
		} catch (IOException ex) {
			System.err.println("thrown exception: " + ex.toString());
			Throwable[] suppressed = ex.getSuppressed();
			for (int i = 0; i < suppressed.length; i++) {
				System.err.println("suppressed exception: "
						+ suppressed[i].toString());
			}
		}
		return;
	}

This code permits an attacker to perform a regex injection.  

Compliant Solution (Whitelisting)

This compliant solution sanitizes the search terms at the beginning of the FindLogEntry(), filtering out nonalphanumeric characters (except space and single quote):

Code Block
bgColor#ccccff
	public static void FindLogEntry(String search) {
		// Sanitize search string
		
Code Block
bgColor#ccccff

import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public final class FilteredLog {
    private static final StringBuilder logBuffer = new StringBuilder();
    private static String log = logBuffer.toString();
    
    static {
        // this is supposed to come from a file, but its here as a string for
        // illustrative purposes
        append("Alice,1267773881,2147651408\n");
        append("Bono,1267774881,2147351708\n");
        append("Charles,1267775881,1175523058\n");
        append("Cecilia,1267773222,291232332\n");
    }
    
    private static void append(CharSequence str) {
        logBuffer.append(str);
        log = logBuffer.toString(); //update log string on append
    }

    public static Set<String> suggestSearches(String search) {
        Set<String> searches = new HashSet<String>();
        
        // Filter user input
        StringBuilder sb = new StringBuilder(search.length());
        		for (int i = 0; i < search.length(); ++i) {
            			char ch = search.charAt(i);
            			if (Character.isLetterOrDigit(ch) ||
                    ch == ' ' ||
                    ch == '\'') {
                				sb.append(ch);
            }
        }
        			}
		}
		search = sb.toString();
        
        
		// Construct regex dynamically from user string
        		String regex = "^((.*? +public\\[\\d+\\] +.*" + search + ".*),[0-9]+?,[0-9]+?$";
        int flags = Pattern.MULTILINE;// ...
        Pattern keywordPattern = Pattern.compile(regex, flags);
        
        // Match regex}

This solution prevents regex injection but also restricts search terms. For example, a user may no longer search for "name =" because nonalphanumeric characters are removed from the search term.

Compliant Solution (Pattern.quote())

This compliant solution sanitizes the search terms by using Pattern.quote() to escape any malicious characters in the search string. Unlike the previous compliant solution, a search string using punctuation characters, such as "name =" is permitted.

Code Block
bgColor#ccccff
	public static void FindLogEntry(String search) {
		// Sanitize search string
        Matchersearch logMatcher = keywordPatternPattern.matcherquote(logsearch);
		// Construct regex dynamically from    while (logMatcher.find()) {
            String found = logMatcher.group(1);
            searches.add(found)user string
		String regex = "(.*? +public\\[\\d+\\] +.*" + search + ".*)";
        }
        // ...
        return searches;
    }
}

Risk Assessment

}

The  Matcher.quoteReplacement() method can be used to escape strings used when doing regex substitution.

Compliant Solution

Another method of mitigating this vulnerability is to filter out the sensitive information prior to matching. Such a solution would require the filtering to be done every time the log file is periodically refreshed, incurring extra complexity and a performance penalty. Sensitive information may still be exposed if the log format changes but the class is not also refactored to accommodate these changes.

Risk Assessment

Failing to sanitize untrusted data included as part of a regular expression can result in the disclosure of sensitive information.

Rule

Severity

Likelihood

Rule

Severity

Liklihood

Remediation Cost

Priority

Level

IDS18-J

medium

probable

high

P8

L2

Violating this guideline may result in sensitive information disclosure.

References

IDS08-J

Medium

Unlikely

Medium

P4

L3

Automated Detection

ToolVersionCheckerDescription
The Checker Framework

Include Page
The Checker Framework_V
The Checker Framework_V

Tainting CheckerTrust and security errors (see Chapter 8)
CodeSonar
Include Page
CodeSonar_V
CodeSonar_V

JAVA.IO.TAINT.REGEX

Tainted Regular Expression (Java)

SonarQube
Include Page
SonarQube_V
SonarQube_V

S2631

Regular expressions should not be vulnerable to Denial of Service attacks

Related Guidelines

MITRE CWE

CWE-625, Permissive Regular Expression

Bibliography


...

Image Added Image Added Image Added Wiki Markup\[[MITRE 09|AA. Java References#MITRE 09]\] [CWE ID 625|http://cwe.mitre.org/data/definitions/625.html] "Permissive Regular Expressions" \[[CVE 05|AA. Java References#CVE]\] [CVE-2005-1949|http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2005-1949]