Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: added sample log and section illustrating regex injection

...

Untrusted input should be sanitized before use to prevent regex injection. When the user must specify a regex as input, care must be taken to ensure that the original regex cannot be modified without restriction. Whitelisting characters (such as letters and digits) before delivering the user supplied string to the regex parser is a good input sanitization strategy. A programmer must provide only a very limited subset of regular expression functionality to the user to minimize any chance of misuse.

...

Regex Injection Example

This noncompliant code example periodically loads a log file into memory and allows clients to obtain keyword search suggestions by passing the keyword as an argument to suggestSearches().

Suppose a system log file contains messages output by various system processes. Some processes produce public messages and some processes produce sensitive messages marked 'private'. Here is an example log file:

Code Block

4/8/11 10:47:03 AM	private[423] Successful logout  name: somename ssn: 111223333
4/8/11 10:47:04 AM	public[48964]	Failed to resolve network service using name = Scipio type = _afpovertcp._tcp domain = local.
4/8/11 10:47:04 AM	public[1]	(public.message[49367]) Exited with exit code: 255
4/8/11 10:47:43 AM	private[423] Successful login  name: somename_else ssn: 444556666
4/8/11 10:48:08 AM	public[48964]	Backup failed with error: 19

A user wishes to search the log file for interesting messages, but is restricted from the private ones. A program might accomplish this by permitting the user to provide search text which becomes part of the following regex:

Code Block

(.*? +public\[\d+\] +.*<SEARCHTEXT>.*)

However, if an attacker can substitute any string for <SEARCHTEXT>, they can perform a regex injection with the following text:

Code Block

.*)|(.

When injected into the regex, the regex becomes:

Code Block

(.*? +public\[\d+\] +.*.*)|(.*.*)

This regex will match any line in the log file, including the private ones.

Noncompliant Code Example

This noncompliant code example periodically loads the log file into memory and allows clients to obtain keyword search suggestions by passing the keyword as an argument to suggestSearches().

Code Block
bgColor#FFCCCC

public class Keywords {
  private static ScheduledExecutorService scheduler = Executors
      .newSingleThreadScheduledExecutor();
  private static CharBuffer log;
  private static final Object lock = new Object();

  // Map log file into memory, and periodically reload
  static
Code Block
bgColor#FFCCCC

public class Keywords {
  private static ScheduledExecutorService scheduler = Executors
      .newSingleThreadScheduledExecutor();
  private static CharBuffer log;
  private static final Object lock = new Object();

  public static Set<String> suggestSearches(String search) {
    synchronized(lock) {
      Set<String> searches = new HashSet<String>();

      // Construct regex dynamically from user string
      String regex = "(" + search + ".*),\\d+?,\\d+?";
  
      Pattern keywordPattern = Pattern.compile(regex);
      Matcher logMatcher = keywordPattern.matcher(log);
      while (logMatcher.find()) {
        String found = logMatcher.group(1);
        searches.add(found);
      }
      return searches;
    }  
  }

  static {
    try {
      FileChannel channel = new FileInputStream(
          "path").getChannel();

      // Get the file's size and map it into memory
      int size = (int) channel.size();
      final MappedByteBuffer mappedBuffer = channel.map(
          FileChannel.MapMode.READ_ONLY, 0, size);

      Charset charset = Charset.forName("ISO-8859-15");
      final CharsetDecoder decoder = charset.newDecoder();

      log = decoder.decode(mappedBuffer); // Read file into char buffer

      Runnable periodicLogRead = new Runnable() {
        @Override public void run() {
          synchronized(lock) { 
            try {
              log = decoder.decode(mappedBuffer);
            } catch (CharacterCodingException e) {
              // Forward to handler 
            } 
          }
        }
      };
      scheduler.scheduleAtFixedRate(periodicLogRead, 0, 5, TimeUnit.SECONDS);
    } catch (Throwable t) {
      // Forward to handler
    }
  }
}

The log file is parsed using a regex constructed dynamically from user input (search):

No Format

"(" + search + ".*),\\d+?,\\d+?"

This regex finds lines in the log that correspond to the value of search. Consequently, if an attacker can perform regex injection, sensitive information such as the IP Address of the user may be disclosed.

A trusted user might enter the "C" keyword to match "Charles" and "Cecilia", however, a malicious user could enter the string .*)|(. This might result in unintended disclosure of sensitive information present in the log. Using the OR (|) operator allows injection of any arbitrary regex.




  public static Set<String> suggestSearches(String search) {
    synchronized(lock) {
      Set<String> searches = new HashSet<String>();

      // Construct regex dynamically from user string
      String regex = "(.*? +public\\[\\d+\\] +.*" + search + ".*)";
  
      Pattern keywordPattern = Pattern.compile(regex);
      Matcher logMatcher = keywordPattern.matcher(log);
      while (logMatcher.find()) {
        String found = logMatcher.group(1);
        searches.add(found);
      }
      return searches;
    }  
  }

}

This code permits a trusted user to search for public log messages such as "error". However, it also allows a malicious attacker to perform the regex injection outlined aboveOne method of mitigating this vulnerability is to filter out the sensitive information prior to matching. However, sensitive information may be exposed if the log format changes but the class is not refactored to accommodate the changes.

Compliant Solution

This compliant solution filters out non-alphanumeric characters (except space and single quote) from the search string, which prevents regex injection.

Code Block
bgColor#ccccff
public class Keywords {
  // ...
  public static Set<String> suggestSearches(String search) {
    synchronized(lock) {
      Set<String> searches = new HashSet<String>();

      StringBuilder sb = new StringBuilder(search.length());
      for (int i = 0; i < search.length(); ++i) {
        char ch = search.charAt(i);
        if (Character.isLetterOrDigit(ch) ||
            ch == ' ' ||
            ch == '\'') {
          sb.append(ch);
        }
      }
      search = sb.toString();

      // Construct regex dynamically from user string
      String regex = "(.*? +public\\[\\d+\\] +.*" + search + ".*),\\d+?,\\d+?";
      // ...
    }
  }
  // ...
}
}

This solution also limits the set of valid search terms. For instance, a user may no longer search for "name =" because the = character would be sanitized out of the regex.

Compliant Solution

Another method of mitigating this vulnerability is to filter out the sensitive information prior to matching. Such a solution would require the filtering to be done every time the log file is periodically refreshed, incurring extra complexity and a performance penalty. Sensitive information may be still exposed if the log format changes but the class is not also refactored to accommodate these changes.

Risk Assessment

Violating this guideline may result in the disclosure of sensitive information.

...