Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Wiki MarkupRegular expressions (regex) are widely used to match strings of text. For example, the POSIX {{grep}} utility supports regular expressions for finding patterns in the specified text. For introductory information on regular expressions, see the Java Tutorials \ [[Tutorials 08|AA. Bibliography#Tutorials 08]\]. The {{Java Tutorials]. The java.util.regex}} package provides the {{Pattern}} class that encapsulates a compiled representation of a regular expression and the {{Matcher}} class that is an engine which interprets and uses a {{Pattern}} to perform matching operations on a {{CharacterSequence}}. class, which is an engine that uses a Pattern to perform matching operations on a CharSequence.

Java's powerful regex The powerful regular expression (regex) facilities must be protected from misuse. An attacker may supply a malicious input that modifies the original regular expression in such a way that the regex fails to comply with the program's specification. This attack vector, referred to as called a regex injection, might affect control flow, cause information leaks, or result in denial-of-service vulnerabilities (DoS) vulnerabilities.

Certain constructs and properties of Java regular expressions are susceptible to exploitation:

  • Matching flags: Untrusted inputs may override matching options that may or may not have been passed to the Pattern.compile() method.
  • Greediness: An untrusted input may attempt to inject a regex that changes the original regex to match as much of the string as possible, exposing sensitive information.
  • Grouping: The programmer can enclose parts of a regular expression in parentheses to perform some common action on the group. An attacker may be able to change the groupings by supplying untrusted input, leading to the security weaknesses described earlier.

Untrusted input should be sanitized before use to prevent regex injection. When the user must specify a regex as input, care must be taken to ensure that the original regex cannot be modified without restriction. White-listing Whitelisting characters (such as letters and digits) before delivering the user-supplied string to the regex parser is a good input validation strategy. However, when the user is allowed to enter regexes, the white-list may need to permit certain dangerous characters. These inputs should not be used to build a security sensitive dynamic regexsanitization strategy. A programmer must provide only a very limited subset of regular expression functionality to the user to minimize any chance of misuse.

...

Regex Injection Example

This noncompliant code example periodically loads a log file into memory and allows clients to obtain keyword search suggestions by passing the keyword as an argument to suggestSearches().

Suppose a system log file contains messages output by various system processes. Some processes produce public messages, and some processes produce sensitive messages marked "private." Here is an example log file:

Code Block
10:47:03 private[423] Successful logout  name: usr1 ssn: 111223333
10:47:04 public[48964] Failed to resolve network service
10:47:04 public[1] (public.message[49367]) Exited with exit code: 255
10:47:43 private[423] Successful login  name: usr2 ssn: 444556666
10:48:08 public[48964] Backup failed with error: 19

A user wishes to search the log file for interesting messages but must be prevented from seeing the private messages. A program might accomplish this by permitting the user to provide search text that becomes part of the following regex:

Code Block
(.*? +public\[\d+\] +.*<SEARCHTEXT>.*)

However, if an attacker can substitute any string for <SEARCHTEXT>, he can perform a regex injection with the following text:

Code Block
.*)|(.*

When injected into the regex, the regex becomes

Code Block
(.*? +public\[\d+\] +.*.*)|(.*.*)

This regex will match any line in the log file, including the private ones.

Noncompliant Code Example

This noncompliant code example searches a log file using search terms from an untrusted user:

Code Block
bgColor#FFCCCC
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LogSearch {
	public static void FindLogEntry(String search) {
		
Code Block
bgColor#FFCCCC

public class Keywords {
  private static ScheduledExecutorService scheduler = Executors
      .newSingleThreadScheduledExecutor();
  private static CharBuffer log;
  private static final Object lock = new Object();

  public static Set<String> suggestSearches(String search) {
    synchronized(lock) {
      Set<String> searches = new HashSet<String>();

      // Construct regex dynamically from user string
      		String regex = "(" + search + ".*),.*? +public\\[\\d]+?,[\\d] +?.*";
 + 
search      Pattern keywordPattern+ ".*)";
		Pattern searchPattern = Pattern.compile(regex);
		try (FileInputStream fis =   Matcher logMatcher = keywordPattern.matcher(log);
      while (logMatcher.find()) {
        String found = logMatcher.group(1);
        searches.add(found);
      }
      return searches;
    }  
  }

  static {
    try {
      new FileInputStream("log.txt")) {
			FileChannel channel = new FileInputStream(
          "path")fis.getChannel();

      			// Get the file's size and map it into memory
      int 			long size = (int) channel.size();
      			final MappedByteBuffer mappedBuffer = channel.map(
          					FileChannel.MapMode.READ_ONLY, 0, size);

      			Charset charset = Charset.forName("ISO-8859-15");
      			final CharsetDecoder decoder = charset.newDecoder();

			// Read file into char buffer
			CharBuffer log = decoder.decode(mappedBuffer); // Read file into char buffer

      Runnable periodicLogRead = new Runnable() {
        @Override public void run() {
          synchronized(lock) { 
            try {
              log = decoder.decode(mappedBuffer);
            } catch (CharacterCodingException e) {
              // Forward to handler 
            } 
          }
        }
      };
      scheduler.scheduleAtFixedRate(periodicLogRead, 0, 5, TimeUnit.SECONDS);
    } catch (Throwable t) {
      // Forward to handler
    }
  }
}

The log file is parsed using a regex constructed dynamically from user input (search):

No Format

"(" + search + ".*),[\\d]+?,[\\d]+?"

This regex finds lines in the log that correspond to the value of search. Consequently, if an attacker can perform regex injection, sensitive information such as the IP Address of the user may be disclosed.

A trusted user might enter the "C" keyword to match "Charles" and "Cecilia", however, a malicious user could enter the string .*)|(. This might result in unintended disclosure of sensitive information present in the log. Using the OR (|) operator allows injection of any arbitrary regex.

One method of mitigating this vulnerability is to filter out the sensitive information prior to matching. However, sensitive information may be exposed if the log format changes but the class is not refactored to accommodate the changes.

Compliant Solution

This compliant solution filters out non-alphanumeric characters (except space and single quote) from the search string, which prevents regex injection.


			Matcher logMatcher = searchPattern.matcher(log);
			while (logMatcher.find()) {
				String match = logMatcher.group();
				if (!match.isEmpty()) {
					System.out.println(match);
				}
			}
		} catch (IOException ex) {
			System.err.println("thrown exception: " + ex.toString());
			Throwable[] suppressed = ex.getSuppressed();
			for (int i = 0; i < suppressed.length; i++) {
				System.err.println("suppressed exception: "
						+ suppressed[i].toString());
			}
		}
		return;
	}

This code permits an attacker to perform a regex injection.  

Compliant Solution (Whitelisting)

This compliant solution sanitizes the search terms at the beginning of the FindLogEntry(), filtering out nonalphanumeric characters (except space and single quote):

Code Block
bgColor#ccccff
	public static void FindLogEntry
Code Block
bgColor#ccccff

public class Keywords {
  // ...
  public static Set<String> suggestSearches(String search) {
		// Sanitize   synchronized(lock) {
      Set<String> searches = new HashSet<String>();

      search string
		StringBuilder sb = new StringBuilder(search.length());
      		for (int i = 0; i < search.length(); ++i) {
        			char ch = search.charAt(i);
        			if (Character.isLetterOrDigit(ch) ||
            ch == ' ' ||
            ch == '\'') {
				sb.append(ch);
			}
		}
		search = sb.toString();

		// Construct regex dynamically from user string
		String  sb.append(ch)regex = "(.*? +public\\[\\d+\\] +.*" + search + ".*)";
        // ...
    }

This solution prevents regex injection but also restricts search terms. For example, a user may no longer search for "name =" because nonalphanumeric characters are removed from the search term.

Compliant Solution (Pattern.quote())

This compliant solution sanitizes the search terms by using Pattern.quote() to escape any malicious characters in the search string. Unlike the previous compliant solution, a search string using punctuation characters, such as "name =" is permitted.

Code Block
bgColor#ccccff
	public static void FindLogEntry(String search)  }{
		// Sanitize search string
        search = sbPattern.toStringquote(search);

      		// Construct regex dynamically from user string
      		String regex = "(.*? +public\\[\\d+\\] +.*" + search + ".*),[\\d]+?,[\\d]+?";
        // ...
    }
  }
  // ...
}

Risk Assessment

The  Matcher.quoteReplacement() method can be used to escape strings used when doing regex substitution.

Compliant Solution

Another method of mitigating this vulnerability is to filter out the sensitive information prior to matching. Such a solution would require the filtering to be done every time the log file is periodically refreshed, incurring extra complexity and a performance penalty. Sensitive information may still be exposed if the log format changes but the class is not also refactored to accommodate these changes.

Risk Assessment

Failing to sanitize untrusted data included as part of a regular expression can Violating this guideline may result in the disclosure of sensitive information.

Rule

Severity

Liklihood

Likelihood

Remediation Cost

Priority

Level

IDS18

IDS08-J

medium

Medium

unlikely

Unlikely

medium

Medium

P4

L3

References

Automated Detection

ToolVersionCheckerDescription
The Checker Framework

Include Page
The Checker Framework_V
The Checker Framework_V

Tainting CheckerTrust and security errors (see Chapter 8)
CodeSonar
Include Page
CodeSonar_V
CodeSonar_V

JAVA.IO.TAINT.REGEX

Tainted Regular Expression (Java)

SonarQube
Include Page
SonarQube_V
SonarQube_V

S2631

Regular expressions should not be vulnerable to Denial of Service attacks

Related Guidelines

MITRE CWE

CWE-625, Permissive Regular Expression

Bibliography


...

Image Added Image Added Image Added Wiki Markup\[[Tutorials 08|AA. Bibliography#Tutorials 08]\] [Regular Expressions|http://java.sun.com/docs/books/tutorial/essential/regex/index.html] \[[MITRE 09|AA. Bibliography#MITRE 09]\] [CWE ID 625|http://cwe.mitre.org/data/definitions/625.html] "Permissive Regular Expressions" \[[CVE 05|AA. Bibliography#CVE]\] [CVE-2005-1949|http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2005-1949]