The C99 function strtok()
is a string tokenization function which takes three arguments: an initial string to be parsed, a const-qualified character delimiter, and a pointer to a pointer to modify to return the result.
The first time you call strtok() is called
, you pass the string to be parsed into tokens, the character delimiter, and the address of the variable in which to return the result inare passed as arguments. The strtok()
function parses the string up to the first instance of the delimiter character, replaces the character in place with a NULL null byte ('\0'), and puts the address of the first character in the token to the passed-in variable. Subsequent calls to strtok()
begin parsing immediately after the most recently-placed NULL null character.
Because strtok()
modifies its argument, the string is subsequently unsafe and cannot be used in its original form. If you need to preserve the original string, copy it into a buffer and pass the address of the buffer to strtok()
instead of the original string.
Non-Compliant Code Example
In this example, the strtok()
function is used to parse the first argument into colon-delimited tokens; it outputs each word from the string on a new line. Assume that PATH is "usr/bin:/usr/sbin:/sbin".
Code Block | ||
---|---|---|
| ||
char *path = getenv("PATH"); /* PATH is something like "/usr/bin:/bin:/usr/sbin:/sbin" */ char *token; token = strtok(path, ":"); puts(token); while (token = strtok(0, ":")) { puts(token); } printf("PATH: %s\n", path); /* PATH is now just "/usr/bin" */ |
In this example, the strtok()
function is used to parse the first argument into colon-delimited tokens; it will output each word from the string on a new line. However, after the while loop ends, path
will have been modified to look like this: "/usr/bin\0/bin\0/usr/sbin\0/sbin\0"
. This is an issue on several levels. If we check our local path
variable, we will only see /usr/bin
now. Even worse, we have unintentionally changed the environment variable PATH, which could cause unintended results.
Compliant Solution
One possible solution is to copy In this solution the string being tokenized is copied into a temporary buffer which isn't is not referenced after the calls to strtok()
:
Code Block | ||
---|---|---|
| ||
char *path = getenv("PATH"); /* PATH is something like "/usr/bin:/bin:/usr/sbin:/sbin" */ char *copy = malloc(strlen(path) + 1); strcpy(copy, path); char *token; token = strtok(copy, ":"); puts(token); while (token = strtok(0, ":")) { puts(token); } printf("PATH: %s\n", path); /* PATH is still "/usr/bin:/bin:/usr/sbin:/sbin" */ |
...