Potentially exploitable undefined behavior can result from any of the following:
- Using pointer arithmetic so that the result does not point into or just past the end of the same object
- Using such pointers in arithmetic expressions
- Dereferencing pointers that do not point to a valid object in memory
- Using an array subscript so that the resulting reference does not refer to an element in the array
The C Standard identifies the following distinct situations in which undefined behavior (UB) can arise as a result of invalid pointer operations:
UB | Description | Example Code |
---|---|---|
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that does not point into, or just beyond, the same array object. | ARR30-C. Do not form or use out of bounds pointers or array subscripts | |
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary | Dereferencing Past the End Pointer, ARR30-C. Do not form or use out of bounds pointers or array subscripts | |
An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression | ARR30-C. Do not form or use out of bounds pointers or array subscripts | |
An attempt is made to access, or generate a pointer to just past, a flexible array member of a structure when the referenced object provides no elements for that array. | ARR30-C. Do not form or use out of bounds pointers or array subscripts | |
The pointer passed to a library function array parameter does not have a value such that all address computations and object accesses are valid. |
Noncompliant Code Example (Forming Out-of-Bounds Pointer)
In this noncompliant code example the function f()
attempts to validate the index
before using it as an offset to the statically allocated table
of integers. However, the function fails to reject negative index
values. When index
is less than zero, the behavior of the addition expression in the return statement of the function is undefined behavior 46. On some implementations, the addition alone can trigger a hardware trap. On other implementations, the addition may produce a result that when dereferenced can trigger a hardware trap. Other implementations still may produce a dereferenceable pointer that points to an object distinct from table
. Using such a pointer to access the object may lead to information exposure or cause the wrong object to be modified.
enum { TABLESIZE = 100 }; static int table[TABLESIZE]; int *f(int index) { if (index < TABLESIZE) { return table + index; } return NULL; }
Compliant Solution
One compliant solution is to detect and reject invalid values of index
if using them in pointer arithmetic would result in an invalid pointer:
enum { TABLESIZE = 100 }; static int table[TABLESIZE]; int *f(int index) { if (index >= 0 && index < TABLESIZE) { return table + index; } return NULL; }
Compliant Solution
Another, slightly simpler and potentially more efficient compliant solution is to use an unsigned type to avoid having to check for negative values while still rejecting out-of-bounds positive values of index
:
#include <stddef.h> enum { TABLESIZE = 100 }; static int table[TABLESIZE]; int *f(size_t index) { if (index < TABLESIZE) { return table + index; } return NULL; }
Noncompliant Code Example (Dereferencing Past-the-End Pointer)
This noncompliant code example shows the flawed logic in the Windows Distributed Component Object Model (DCOM) Remote Procedure Call (RPC) interface that was exploited by the W32.Blaster.Worm. The error is that the while
loop in the GetMachineName()
function (used to extract the host name from a longer string) is not sufficiently bounded. When the character array pointed to by pwszTemp
does not contain the backslash character among the first MAX_COMPUTERNAME_LENGTH_FQDN + 1
elements, the final valid iteration of the loop will dereference past-the-end pointer, resulting in exploitable undefined behavior 47. In this case, the actual exploit allowed the attacker to inject executable code into a running program. Economic damage from the Blaster worm has been estimated to be at least $525 million [Pethia 2003].
For a discussion of this programming error in the Common Weakness Enumeration database, see CWE-119, "Failure to constrain operations within the bounds of a memory buffer," and CWE-121, "Stack-based buffer overflow."
error_status_t _RemoteActivation( /* ... */, WCHAR *pwszObjectName, ... ) { *phr = GetServerPath( pwszObjectName, &pwszObjectName); /* ... */ } HRESULT GetServerPath( WCHAR *pwszPath, WCHAR **pwszServerPath ){ WCHAR *pwszFinalPath = pwszPath; WCHAR wszMachineName[MAX_COMPUTERNAME_LENGTH_FQDN+1]; hr = GetMachineName(pwszPath, wszMachineName); *pwszServerPath = pwszFinalPath; } HRESULT GetMachineName( WCHAR *pwszPath, WCHAR wszMachineName[MAX_COMPUTERNAME_LENGTH_FQDN+1]) { pwszServerName = wszMachineName; LPWSTR pwszTemp = pwszPath + 2; while ( *pwszTemp != L'\\' ) *pwszServerName++ = *pwszTemp++; /* ... */ }
Compliant Solution
In this compliant solution, the while
loop in the GetMachineName()
function is bounded so that the loop terminates when a backslash character is found, the null termination character (L'\0'
) is discovered, or the end of the buffer is reached. This code does not result in a buffer overflow, even if no backslash character is found in wszMachineName
.
HRESULT GetMachineName( wchar_t *pwszPath, wchar_t wszMachineName[MAX_COMPUTERNAME_LENGTH_FQDN+1]) { wchar_t *pwszServerName = wszMachineName; wchar_t *pwszTemp = pwszPath + 2; wchar_t *end_addr = pwszServerName + MAX_COMPUTERNAME_LENGTH_FQDN; while ( (*pwszTemp != L'\\') && ((*pwszTemp != L'\0')) && (pwszServerName < end_addr) ) { *pwszServerName++ = *pwszTemp++; } /* ... */ }
This compliant solution is for illustrative purposes and is not necessarily the solution implemented by Microsoft. This particular solution may not be correct because there is no guarantee that a backslash is found.
Noncompliant Code Example (Using Past the End Index)
Similarly to the dereferencing-past-the-end-pointer error, the function insert_in_table()
in this noncompliant code example uses an otherwise valid index to attempt to store a value in an element just past the end of an array.
First, the function incorrectly validates the index pos
against the size of the buffer. When the index is equal to size
, the function attempts to store value
in a memory location just past the end of the buffer.
Second, when the index is greater than size
, the function modifies size
before growing the size of the buffer. If the call to realloc()
fails to increase the size of the buffer, the next call to the function with a value of pos
equal to or greater than the original value of size
will again attempt to store value
in a memory location just past the end of the buffer or beyond.
Third, the function violates INT30-C. Ensure that unsigned integer operations do not wrap when calculating the size of memory to allocate.
For a discussion of this programming error in the Common Weakness Enumeration database, see CWE-122, "Heap-based buffer overflow," and CWE-129, "Improper validation of array index."
#include <stdlib.h> static int *table = NULL; static size_t size = 0; int insert_in_table(size_t pos, int value) { if (size < pos) { int *tmp; size = pos + 1; tmp = (int *)realloc(table, sizeof(*table) * size); if (tmp == NULL) { return -1; /* Failure */ } table = tmp; } table[pos] = value; return 0; }
Compliant Solution
This compliant solution correctly validates the index pos
by using the <=
operator, ensures the multiplication will not overflow, and avoids modifying size
until it has verified that the call to realloc()
was successful:
#include <stdint.h> #include <stdlib.h> static int *table = NULL; static size_t size = 0; int insert_in_table(size_t pos, int value) { if (size <= pos) { int *tmp; if ((pos + 1) > SIZE_MAX / sizeof(*table)) { return -1; } tmp = (int *)realloc(table, sizeof(*table) * (pos + 1)); if (tmp == NULL) { return -1; } /* Modify size only after realloc succeeds */ size = pos + 1; table = tmp; } table[pos] = value; return 0; }
Noncompliant Code Example (Apparently Accessible Out-of-Range Index)
This noncompliant code example declares matrix
to consist of 7 rows and 5 columns in row-major order. The function init_matrix
then iterates over all 35 elements in an attempt to initialize each to the value given by the function argument x
. However, because multidimensional arrays are declared in C in row-major order, and the function iterates over the elements in column-major order, and when the value of j
reaches the value COLS
during the first iteration of the outer loop, the function attempts to access element matrix[0][5]
. Because the type of matrix
is int[7][5]
, the j
subscript is out of range, and the access has undefined behavior 49.
static const size_t COLS = 5; static const size_t ROWS = 7; static int matrix[ROWS][COLS]; void init_matrix(int x) { for (size_t i = 0; i != COLS; ++i) { for (size_t j = 0; j != ROWS; ++j) { matrix[i][j] = x; } } }
Compliant Solution
This compliant solution avoids using out-of-range indices by initializing matrix
elements in the same row-major order as multidimensional objects are declared in C:
static const size_t COLS = 5; static const size_t ROWS = 7; static int matrix[ROWS][COLS]; void init_matrix(int x) { for (size_t i = 0; i != ROWS; ++i) { for (size_t j = 0; j != COLS; ++j) { matrix[i][j] = x; } } }
Noncompliant Code Example (Pointer Past Flexible Array Member)
In this noncompliant code example, the function find()
attempts to iterate over the elements of the flexible array member buf
, starting with the second element. However, because function g()
does not allocate any storage for the member, the expression first++
in find()
attempts to form a pointer just past the end of buf
when there are no elements. This attempt results in undefined behavior 62.
#include <stdlib.h> struct S { size_t len; char buf[]; /* Flexible array member */ }; const char *find(const struct S *s, int c) { const char *first = s->buf; const char *last = s->buf + s->len; while (first++ != last) { /* Undefined behavior */ if (*first == (unsigned char)c) { return first; } } return NULL; } void g(void) { struct S *s = (struct S *)malloc(sizeof(struct S)); s->len = 0; find(s, 'a'); }
Compliant Solution
This compliant solution avoids incrementing the pointer unless a value past the pointer's current value is known to exist:
#include <stdlib.h> struct S { size_t len; char buf[]; /* Flexible array member */ }; const char *find(const struct S *s, int c) { const char *first = s->buf; const char *last = s->buf + s->len; while (first != last) { /* Avoid incrementing here */ if (*++first == (unsigned char)c) { return first; } } return NULL; } void g(void) { struct S *s = (struct S *)malloc(sizeof(struct S)); s->len = 0; find(s, 'a'); }
Noncompliant Code Example (Invalid Access by Library Function)
In this noncompliant code example, the function f()
calls fread()
to read nitems
of type wchar_t
, each size
bytes in size, into an array of BUFSIZ
elements, wbuf
. However, the expression used to compute the value of nitems
fails to account for the fact that, unlike the size of char
, the size of wchar_t
may be greater than 1. Thus, fread()
could attempt to form pointers past the end of wbuf
and use them to assign values to nonexistent elements of the array. Such an attempt results in undefined behavior 109 . A likely manifestation of this undefined behavior is a classic buffer overflow, which is often exploitable by code injection attacks.
For a discussion of this programming error in the Common Weakness Enumeration database, see CWE-121, "Access of memory location after end of buffer," and CWE-805, "Buffer access with incorrect length value."
#include <stddef.h> #include <stdio.h> void f(FILE *file) { wchar_t wbuf[BUFSIZ]; const size_t size = sizeof(*wbuf); const size_t nitems = sizeof(wbuf); size_t nread; nread = fread(wbuf, size, nitems, file); }
Compliant Solution
This compliant solution correctly computes the maximum number of items for fread()
to read from the file:
#include <stddef.h> #include <stdio.h> void f(FILE *file) { wchar_t wbuf[BUFSIZ]; const size_t size = sizeof(*wbuf); const size_t nitems = sizeof(wbuf) / size; size_t nread; nread = fread(wbuf, size, nitems, file); }
Noncompliant Code Example (Improper Scaling)
In this noncompliant example, the integer skip
is scaled when added to the pointer s
and may point outside the bounds of the object referenced by s
:
#include <stddef.h> #include <stdlib.h> #include <string.h> struct big { unsigned long long ull_1; unsigned long long ull_2; unsigned long long ull_3; int si_4; int si_5; }; int g(void) { size_t skip = offsetof(struct big, ull_2); struct big *s = (struct big *)malloc(4 * sizeof(struct big)); if (s == NULL) { return -1; /* Failure */ } memset(s + skip, 0, sizeof(struct big) - skip); return 0; }
Compliant Solution
This compliant solution does not scale skip
:
#include <stddef.h> #include <stdlib.h> #include <string.h> struct big { unsigned long long ull_1; unsigned long long ull_2; unsigned long long ull_3; int si_4; int si_5; }; int g(void) { size_t skip = offsetof(struct big, ull_2); struct big *s = (struct big *)malloc(4 * sizeof(struct big)); if (s == NULL) { return -1; /* Failure */ } memset(((unsigned char *)s) + skip, 0, sizeof(struct big) - skip); return 0; }
Risk Assessment
Accessing out-of-range pointers or array subscripts for writing can result in a buffer overflow and the execution of arbitrary code with the permissions of the vulnerable process or unintended information disclosure.
Rule | Severity | Likelihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
ARR30-C | High | Likely | High | P9 | L2 |
Automated Detection
Tool | Version | Checker | Description |
---|---|---|---|
Could be configured to catch violations of this rule. The way to catch the noncompliant code example is to first hunt for example code that follows this pattern: for (LPWSTR pwszTemp = pwszPath + 2; *pwszTemp != L'\\'; In particular, the iteration variable is a pointer, it gets incremented, and the loop condition does not set an upper bound on the pointer. Once this case is handled, ROSE can handle cases like the real noncompliant code example, which is effectively the same semantics, just different syntax | |||
2017.07 | ARRAY_VS_SINGLETON NEGATIVE_RETURNS OVERRUN_STATIC OVERRUN_DYNAMIC | Can detect the access of memory past the end of a memory buffer/array Can detect when the loop bound may become negative Can detect the out-of-bound read/write to array allocated statically or dynamically | |
2024.3 | ABV.ITERATOR SV.TAINTED.LOOP_BOUND | ||
LDRA tool suite | 9.7.1 | 47 S | Partially implemented |
PRQA QA-C | Unable to render {include} The included page could not be found. | 3680 3681 3682 3683 3685 (U) 3686 3688 3689 (U) 3690 3692 | Partially implemented |
Related Vulnerabilities
CVE-2008-1517 results from a violation of this rule. Before Mac OSX version 10.5.7, the xnu kernel accessed an array at an unverified user-input index, allowing an attacker to execute arbitrary code by passing an index greater than the length of the array and therefore accessing outside memory [xorl 2009].
Search for vulnerabilities resulting from the violation of this rule on the CERT website.
Related Guidelines
ISO/IEC TR 24772:2013 | Arithmetic Wrap-around Error [FIF] Unchecked Array Indexing [XYZ] |
ISO/IEC TS 17961 | Forming or using out-of-bounds pointers or array subscripts [invptr] |
MITRE CWE | CWE-119, Failure to constrain operations within the bounds of a memory buffer CWE-121, Stack-based buffer overflow CWE-122, Heap-based buffer overflow CWE-129, Unchecked array indexing CWE-788, Access of memory location after end of buffer CWE-805, Buffer access with incorrect length value |
Bibliography
[Finlay 2003] | |
[Microsoft 2003] | |
[Pethia 2003] | |
[Seacord 2013] | Chapter 1, "Running with Scissors" |
[Viega 2005] | Section 5.2.13, "Unchecked Array Indexing" |
[xorl 2009 ] | "CVE-2008-1517: Apple Mac OS X (XNU) Missing Array Index Validation" |