Many built-in functions accept a regex pattern as an argument. Furthermore, any subroutine can accept a string yet treat it as a regex pattern. This could be done, for example, by passing the string to the match operator (m//). Because regex patterns are encoded as regular strings, it is tempting to assume that a string literal will be treated as if a regex that matched only that string literal were supplied. Unexpected function behavior can result if the string contains characters that have special meanings when the string is treated as a regex pattern. Therefore, do not pass strings that are not clearly regex patterns to a function that takes a regex.

Noncompliant Code Example

This code example appears to split a list of names.

my $data = 'Tom$Dick$Harry';
my @names = split( '$', $data);

But the first argument to split() is treated as a regex pattern. Because $ indicates the end of the string, no splitting occurs.

Compliant Solution

This compliant solution passes a regex pattern to split() as the first argument, properly specifying $ as a raw character. Consequently, @names is assigned the three names Tom, Dick, and Harry.

my $data = 'Tom$Dick$Harry';
my @names = split( m/\$/, $data);

Exceptions

STR31-PL-EX0: A string literal may be passed to a function if it normally takes a regex pattern but provides special behavior for that string. For example, the perlfunc manpage [Wall 2011] says, regarding PATTERN, the first argument to split():

As a special case, specifying a PATTERN of space (' ') will split on white space just as "split" with no arguments does. Thus, "split(' ')" can be used to emulate awk's default behavior, whereas "split(/ /)" will give you as many initial null fields (empty string) as there are leading spaces.
 

Risk Assessment

Recommendation

Severity

Likelihood

Remediation Cost

Priority

Level

STR31-PL

Low

Likely

Low

P9

L2

Automated Detection

Tool

Diagnostic

Perl::Critic

BuiltinFunctions::ProhibitStringySplit

Bibliography

 


4 Comments

  1. This is a special feature of split().  I do not think it is accurate to say "any subroutine can accept a string yet treat it as a regex pattern".

    1. Sure any subroutine could treat a string like a regex, by passing it to m//. I amended the intro to explain this.

      1. Anonymous

        Only split accepts m/\$/ like that.

        Other subroutines require that it be qr/\$/.

        1. Here is some code that correctly prints out every line on standard input with a $ in it:

          perl -n -e 'foreach $element (grep(m/\$/, <STDIN>)) {print $element;}' < pl.pl
          

          This code prints nothing:

          perl -n -e 'foreach $element (grep(gr/\$/, <STDIN>)) {print $element;}' < pl.pl
          

          This code prints every line, because it treats $ as a regexp meaning "end of line":

          perl -n -e 'foreach $element (grep("\$", <STDIN>)) {print $element;}' < pl.pl
          

          I would conclude that grep accetps m/whatever/ just like split.