According to subclause 6.2.7 of the C Standard [ISO/IEC 9899:2011],
All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
(See also undefined behavior 15 of Annex J.)
Further, according to subclause 6.4.2.1,
Any identifiers that differ in a significant character are different identifiers. If two identifiers differ only in nonsignificant characters, the behavior is undefined.
(See also undefined behavior 31 of Annex J.)
Identifiers in mutually visible scopes must be deemed unique by the compiler to prevent confusion about which variable or function is being referenced. Implementations can allow additional nonunique characters to be appended to the end of identifiers, making the identifiers appear unique while actually being indistinguishable.
It is reasonable for scopes that are not visible to each other to have duplicate identifiers. For example, two functions can each have a local variable with the same name because their scopes cannot access each other. But a function's local variable names should be distinct from each other as well as from all static variables declared within the function's file (and from all included header files.)
To guarantee that identifiers are unique, the number of significant characters recognized by the most restrictive compiler used must be determined. This assumption must be documented in the code.
The standard defines the following minimum requirements:
- 63 significant initial characters in an internal identifier or a macro name. (Each universal character name or extended source character is considered a single character.)
- 31 significant initial characters in an external identifier. (Each universal character name specifying a short identifier of 0000FFFF or less is considered 6 characters; each universal character name specifying a short identifier of 00010000 or more is considered 10 characters; and each extended source character, if any exist, is considered the same number of characters as the corresponding universal character name.)
Restriction of the significance of an external name to fewer than 255 characters in the standard (considering each universal character name or extended source character as a single character) is an obsolescent feature that is a concession to existing implementations. As a result, it is not necessary to comply with this restriction as long as the identifiers are unique and the assumptions concerning the number of significant characters are documented.
Noncompliant Code Example (Source Character Set)
On implementations that support only the minimum requirements for significant characters required by the standard, this code example is noncompliant because the first 31 characters of the external identifiers are identical:
extern int *global_symbol_definition_lookup_table_a; extern int *global_symbol_definition_lookup_table_b;
Compliant Solution (Source Character Set)
In a compliant solution, the significant characters in each identifier must differ:
extern int *a_global_symbol_definition_lookup_table; extern int *b_global_symbol_definition_lookup_table;
Noncompliant Code Example (Universal Character Names)
In this noncompliant code example, both external identifiers consist of four universal character names. Because the first three universal character names of each identifier are identical, both identify the same integer array on implementations that support only the minimum requirements for significant characters required by the standard:
extern int *\U00010401\U00010401\U00010401\U00010401; extern int *\U00010401\U00010401\U00010401\U00010402;
Compliant Solution (Universal Character Names)
For portability, the first three universal character name combinations used in an identifier must be unique:
extern int *\U00010401\U00010401\U00010401\U00010401; extern int *\U00010402\U00010401\U00010401\U00010401;
Risk Assessment
Nonunique identifiers can lead to abnormal program termination, denial-of-service attacks, or unintended information disclosure.
Rule | Severity | Likelihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
DCL23-C | Medium | Unlikely | Low | P6 | L2 |
Automated Detection
Tool | Version | Checker | Description |
---|---|---|---|
Astrée | 24.04 | Supported indirectly via MISRA C:2012 Rules 5.1, 5.2, 5.3, 5.4 and 5.5. | |
Axivion Bauhaus Suite | 7.2.0 | CertC-DCL23 | |
CodeSonar | 8.1p0 | LANG.ID.ND.EXT LANG.ID.NU.EXT LANG.STRUCT.DECL.MGT | Non-distinct identifiers: external names Non-unique identifiers: external name Global variable declared with different types |
Compass/ROSE | Can detect some violations of this rule but cannot flag violations involving universal names | ||
Helix QAC | 2024.3 | C0627, C0776, C0777, C0778, C0779, C0789, C0791, C0793 | |
Klocwork | 2024.3 | MISRA.IDENT.DISTINCT.C99.2012 | |
LDRA tool suite | 9.7.1 | 17 D | Fully implemented |
PC-lint Plus | 1.4 | 621 | Fully supported |
Polyspace Bug Finder | R2024a | Checks for:
Rec. fully covered. | |
RuleChecker | 24.04 | Supported indirectly via MISRA C:2012 Rules 5.1, 5.2, 5.3, 5.4 and 5.5. | |
SonarQube C/C++ Plugin | 3.11 | IdentifierLongerThan31 |
Related Vulnerabilities
Search for vulnerabilities resulting from the violation of this rule on the CERT website.
Related Guidelines
ISO/IEC TR 24772:2013 | Choice of Clear Names [NAI] Identifier Name Reuse [YOW] |
MISRA C:2012 | Rule 5.1 (required) |
Bibliography
[ISO/IEC 9899:2011] | Subclause 6.2.7, "Compatible Type and Composite Type" |
5 Comments
Douglas A. Gwyn
Really long identifiers are bad anyway, from a human-factors perspective. Two 31-character identifiers that differ only in the 20th character are likely to be mistaken for each other, and take too long to comprehend when reading the code.
Martin Sebor
I've updated this rule to reference bullet 14 of Appendix J but in hindsight I'm not 100% sure the two are necessarily related.
One insidious problem, one that could potentially be exploited, is with identifiers that aren't "visible" in the same scope but that are declared to have external linkage. For example, suppose file
a.c
contains the following definition of functionsquare()
:while file
b.c
contains this definition:Linking these two translation units together causes a violation of bullet 14 regardless of whether the two functions are "visible" in the same scope. In fact, if they were visible, the violation would be easily detected by any compiler and diagnosed, but when they're not it typically couldn't be, which is where the undefined behavior comes in.
On the other hand, the cases described in this rule would, in all likelihood, lead to compiler or linker errors (and thus not represent a security flaw).
Robert Seacord (Manager)
Dan Quinlan, who sometimes participates in this group, has a paper on "Support for Whole-Program Analysis and the Verification of the One-Definition Rule in C++" The One-Definition Rule (ODR) violation states that types and functions appearing in multiple compilation units must be defined identically. So apparently, this is a problem in C++ but I'm not sure if it extends to C or not.
Martin Sebor
Yes, in C++, when permitted, multiple definitions must be token-by-token identical. For example, if
square()
were definedinline
, it would have to expand to same set of lexical tokens in each translation unit.The closest requirement I can find in C is:
Aaron Ballman
Perhaps another NCCE would be opaque structures from different translation units. Eg)
This would (potentially, given the UB) print x: 20, y: 10 instead of x:10, y: 20 while not providing a linking error in C.