...
There are several national variants of ASCII. As a result, the original ASCII is often called US-ASCII. ISO/IEC 646-1991 defines a character set, similar to US-ASCII, but with code positions corresponding to US-ASCII characters @[]{|
} as national use positions [ISO/IEC 646-1991]. It also gives some liberties with the characters #$^`~
. In particular characters (e.g., #$^`~
). In ISO/IEC 646-1991, several national variants of ASCII are defined, assigning different letters and symbols to the national use positions. Consequently, the characters that appear in those positions, including those in US-ASCII, are less portable in international data transfer. Because of the national variants, some characters are less portable than others: they might be transferred or interpreted incorrectly.
...
Code Block | ||||
---|---|---|---|---|
| ||||
#include <fcntl.h> #include <sys/stat.h> int main(void) { char *file_name = "»£???«\xe5ngstr\xf6m"; mode_t mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH; int fd = open(file_name, O_CREAT | O_EXCL | O_WRONLY, mode); if (fd == -1) { /* Handle error */ } } |
An implementation is free to define its own mapping of the "nonsafe" characters. For example, when tested run on a Red Hat Enterprise Linux distribution7.5, this noncompliant code example resulted in the following file name being revealed by the ls
command:
Code Block |
---|
?ngstr?????m |
Compliant Solution (File Name 1)
...
Failing to use only the subset of ASCII that is guaranteed to work can result in misinterpreted data.
Recommendation | Severity | Likelihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
MSC09-C | Medium | Unlikely | Medium | P4 | L3 |
Automated Detection
Tool | Version | Checker | Description | ||||||
---|---|---|---|---|---|---|---|---|---|
Astrée |
| bitfield-name | Partially checked | ||||||
Helix QAC |
| C0285, C0286, C0287, C0288, C0289, C0299 | |||||||
LDRA tool suite |
| 113 S |
Partially implemented |
0285
0286
0287
0288
0289
0299
Parasoft C/C++test |
| CERT_C-MSC09-a | Only use characters defined in the ISO C standard | ||||||
RuleChecker |
| bitfield-name | Partially checked | ||||||
SonarQube C/C++ Plugin |
| S1578 |
Related Vulnerabilities
Search for vulnerabilities resulting from the violation of this rule on the CERT website.
Related Guidelines
SEI CERT C++ Coding Standard | VOID MSC09-CPP. Character encoding: Use subset of ASCII for safety |
CERT Oracle Secure Coding Standard for Java | IDS50-J. Use conservative file naming conventions |
MISRA C:2012 | Directive 1.1 (required) Rule 4.1 (required) |
MITRE CWE | CWE-116, Improper encoding or escaping of output |
Bibliography
[ISO/IEC 646-1991] | "ISO 7-Bit Coded Character Set for Information Interchange" |
[ISO/IEC 9899:2011] | Subclause 5.2.1, "Character Sets" |
[Kuhn 2006] | "UTF-8 and Unicode FAQ for UNIX/Linux" |
[VU#439395] |
[Wheeler 2003 | Section 5.4, "File Names" |
...
...