You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

The Pattern class is very useful for performing operations involving regular expressions. However, unlike in other languages where a regular expression represented as a character string is literally used for pattern matching, in Java, a given String used for pattern matching is compiled into an instance of Pattern. As a result, escape characters are interpreted differently than in other languages.

Noncompliant Code Example

In the following example, a method performing matching to regular expressions, matchPattern, is implemented. However, the assumption is that the pattern matches to word boundaries and will thus split a given string into individual words.

import java.util.regex.Pattern;

public class BadSplitter {
  private final String WORDS = "\b"; // Intend to split on word boundaries

  public String[] split(String input){
    Pattern p = Pattern.compile(WORDS);
    String[] input_array = p.split(input);
    return input_array;
  }
}

The String WORDS is compiled to the backspace character instead of the regular expression for splitting on word boundaries.

Compliant Solution

This compliant solution shows the correct value of the String WORDS to produce a regular expression to split on word boundaries.

import java.util.regex.Pattern;

public class GoodSplitter {
  private final String WORDS = "\\b"; // Will allow splitting on word boundaries

  public String[] split(String input){
    Pattern p = Pattern.compile(WORDS);
    String[] input_array = p.split(input);
    return input_array;
  }
}

In this example, the String WORDS is compiled to "\b", the pattern for matching to word boundaries. This is because the escape on the slash is converted to a single slash when the String is compiled.

Noncompliant Code Example

In the following example, a method performing matching to regular expressions, matchPattern, is implemented to split input Strings on one or more white space characters. However, the String SPACE is not correctly formed.

import java.util.regex.Pattern;

public class BadSplitter {
  private final String SPACE = "\s+"; // Intend to split on one or more whitespace

  public String[] split(String input){
    Pattern p = Pattern.compile(SPACE); // Compiler error
    String[] input_array = p.split(input);
    return input_array;
  }
}

String SPACE attempts to escape the character 's', producing an illegal escape character compiler error.

Compliant Solution

This compliant solution shows the correct value of the String SPACE to produce a regular expression to split one or more white space characters.

import java.util.regex.Pattern;

public class BadSplitter {
  private final String SPACE = "\\s+";

  public String[] split(String input){
    Pattern p = Pattern.compile(SPACE); // Will split on one or more white space characters
    String[] input_array = p.split(input);
    return input_array;
  }
}

In this example, the String SPACE is compiled to "\s+", the pattern for matching to one or more white space characters.

  • No labels