Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Regular expressions (regex) are widely used to match strings of text. For example, the POSIX grep utility supports regular expressions for finding patterns in the specified text. For introductory information on regular expressions, see the Java Tutorials [Java Tutorials 08]. The java.util.regex package provides the Pattern class that encapsulates a compiled representation of a regular expression and the Matcher class, which is an engine that uses a Pattern to perform matching operations on a CharSequence.

Java's powerful regex facilities must be protected from misuse. An attacker may supply a malicious input that modifies the original regular expression in such a way that the regex fails to comply with the program's specification. This attack vector, called a regex injection, might affect control flow, cause information leaks, or result in denial-of-service (DoS) vulnerabilities.

Certain constructs and properties of Java regular expressions are susceptible to exploitation:

  • Matching flags: Untrusted inputs may override matching options that may or may not have been passed to the Pattern.compile() method.
  • Greediness: An untrusted input may attempt to inject a regex that changes the original regex to match as much of the string as possible, exposing sensitive information.
  • Grouping: The programmer can enclose parts of a regular expression in parentheses to perform some common action on the group. An attacker may be able to change the groupings by supplying untrusted input.

Untrusted input should be sanitized before use to prevent regex injection. When the user must specify a regex as input, care must be taken to ensure that the original regex cannot be modified without restriction. Whitelisting characters (such as letters and digits) before delivering the user-supplied string to the regex parser is a good input sanitization strategy. A programmer must provide only a very limited subset of regular expression functionality to the user to minimize any chance of misuse.

...

Suppose a system log file contains messages output by various system processes. Some processes produce public messages, and some processes produce sensitive messages marked "private." Here is an example log file:

...

However, if an attacker can substitute any string for <SEARCHTEXT>, he can perform a regex injection with the following text:

...

When injected into the regex, the regex becomes:

Code Block
(.*? +public\[\d+\] +.*.*)|(.*.*)

...

This noncompliant code example searches a log file using search terms from an untrusted user.:

Code Block
bgColor#FFCCCC
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LogSearch {
	public static void FindLogEntry(String search) {
		// Construct regex dynamically from user string
		String regex = "(.*? +public\\[\\d+\\] +.*" + search + ".*)";
		Pattern searchPattern = Pattern.compile(regex);
		try (FileInputStream fis = new FileInputStream("log.txt")) {
			FileChannel channel = fis.getChannel();
			// Get the file's size and map it into memory
			long size = channel.size();
			final MappedByteBuffer mappedBuffer = channel.map(
					FileChannel.MapMode.READ_ONLY, 0, size);
			Charset charset = Charset.forName("ISO-8859-15");
			final CharsetDecoder decoder = charset.newDecoder();
			// Read file into char buffer
			CharBuffer log = decoder.decode(mappedBuffer);
			Matcher logMatcher = searchPattern.matcher(log);
			while (logMatcher.find()) {
				String match = logMatcher.group();
				if (!match.isEmpty()) {
					System.out.println(match);
				}
			}
		} catch (IOException ex) {
			System.err.println("thrown exception: " + ex.toString());
			Throwable[] suppressed = ex.getSuppressed();
			for (int i = 0; i < suppressed.length; i++) {
				System.err.println("suppressed exception: "
						+ suppressed[i].toString());
			}
		}
		return;
	}

...

Compliant Solution (Whitelisting)

This compliant solution solution sanitizes the search terms at the beginning of the FindLogEntry(), filtering out nonalphanumeric characters (except space and single quote).:

Code Block
bgColor#ccccff
	public static void FindLogEntry(String search) {
		// Sanitize search string
		StringBuilder sb = new StringBuilder(search.length());
		for (int i = 0; i < search.length(); ++i) {
			char ch = search.charAt(i);
			if (Character.isLetterOrDigit(ch) || ch == ' ' || ch == '\'') {
				sb.append(ch);
			}
		}
		search = sb.toString();
		
		// Construct regex dynamically from user string
        // . . .
    }

This solution prevents regex injection but also restricts search terms. For example, a user may no longer search for "name =" because nonalphanumeric characters are removed from the search term.

...

Another method of mitigating this vulnerability is to filter out the sensitive information prior to matching. Such a solution would require the filtering to be done every time the log file is periodically refreshed, incurring extra complexity and a performance penalty. Sensitive information may still be exposed if the log format changes but the class is not also refactored to accommodate these changes.

Risk Assessment

Failing to sanitize untrusted data included as part of a regular expression can result in the disclosure of sensitive information.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

IDS08-J

mediumMedium

unlikelyUnlikely

mediumMedium

P4

L3

Related Guidelines

MITRE CWE

CWE-625. , Permissive regular expressionRegular Expression

Bibliography

 

...