According to MISRA 2008, concatenation of wide and narrow string literals leads to undefined behavior. This was once considered implicitly undefined behavior until C90 [ISO/IEC 9899:1990]. However, C99 defined this behavior [ISO/IEC 9899:1999], and C11 further explains in section subclause 6.4.5, paragraph 5 [ISO/IEC 9899:2011]:
In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed string literal tokens are concatenated into a single multibyte character sequence. If any of the tokens has an encoding prefix, the resulting multibyte character sequence is treated as having the same prefix; otherwise, it is treated as a character string literal. Whether differently-prefixed wide string literal tokens can be concatenated and, if so, the treatment of the resulting multibyte character sequence are implementation-defined.
Nonetheless, it is recommended that string literals that are concatenated should all be the same type so as not to rely on implementation-defined behavior or undefined behavior if compiled on a platform that supports only C90.
...
This noncompliant code example concatenates wide and narrow string literals. Although the behavior is undefined in C90, the programmer probably intended to create a wide - string literal.
Code Block | ||||
---|---|---|---|---|
| ||||
wchar_t *msg = L"This message is very long, so I want to divide it " "into two parts."; |
...
If the concatenated string needs to be a wide string literal, each element in the concatenation must be a wide string literal, as in this compliant solution.:
Code Block | ||||
---|---|---|---|---|
| ||||
wchar_t *msg = L"This message is very long, so I want to divide it " L"into two parts."; |
...
If wide string literals are unnecessary, it is better to use narrow string literals, as in this compliant solution.:
Code Block | ||||
---|---|---|---|---|
| ||||
char *msg = "This message is very long, so I want to divide it " "into two parts."; |
...
The concatenation of wide and narrow string literals could lead to undefined behavior.
Rule | Severity | Likelihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
STR10-C |
Low |
Probable |
Medium | P4 | L3 |
Automated Detection
Tool | Version | Checker | Description | ||||||
---|---|---|---|---|---|---|---|---|---|
Astrée |
| encoding-mismatch | Fully checked | ||||||
Axivion Bauhaus Suite |
| CertC-STR10 | |||||||
ECLAIR |
| CC2.STR10 | Fully implemented. | ||||||
Helix QAC |
| C0874 | |||||||
LDRA tool suite |
| 450 S | Fully implemented | ||||||
Parasoft C/C++test |
| CERT_C-STR10-a | Narrow and wide string literals shall not be concatenated | ||||||
PC-lint Plus |
| 707 | Fully supported | ||||||
SonarQube C/C++ Plugin |
| NarrowAndWideStringConcat | |||||||
RuleChecker |
| encoding-mismatch | Fully checked |
Related Vulnerabilities
Search for vulnerabilities resulting from the violation of this rule on the CERT website.
Related Guidelines
MISRA C++:2008 | Rule 2-13-5 |
Bibliography
...
...
2011] | Section 6.4.5, "String |
...
Literals" |
...