Page History

Strings are a fundamental concept in software engineering, but they are not a built-in type in C. Null-terminated byte strings (NTBS) consist of a contiguous sequence of characters terminated by and including the first null character and are supported in C as the format used for string literals. The C programming language supports the following types of null-terminated byte strings: single-byte character strings, multibyte character strings, and wide-character strings. Single-byte and multibyte character strings are both described as null-terminated byte strings, which are also called narrow character strings.

A pointer to a singlenull-terminated byte or multibyte character string points to its initial character. The length of the string is the number of bytes preceding the null character, and the value of the string is the sequence of the values of the contained characters, in order.

A wide string is a contiguous sequence of wide characters (of type wchar_t) terminated by and including the first null wide character. A pointer to a wide string points to its initial (lowest addressed) wide character. The length of a wide string is the number of wide characters preceding the null wide character, and the value of a wide string is the sequence of code values of the contained wide characters, in order.

Null-terminated byte strings are implemented as arrays of characters and are susceptible to the same problems as arrays. As a result, rules and recommendations for arrays should also be applied to null-terminated byte strings.

The C standard Standard uses the general following philosophy outlined below for choosing character types, though it is not explicitly stated in one place.:

`signed char` and `unsigned char`

Suitable for small integer values

"

...

Plain" `char`

The type of each element of a string literal.
Used for character data from a limited character set (where signedness has little meaning) as opposed to integer data.

`int`

Used for data that could can be either EOF (a negative value) or character data interpreted as unsigned char and then converted to int. Therefore, returned As a result, it is returned by fgetc(), getc(), getchar(), and ungetc(). Also, accepted by the character-handling functions from <ctype.h>, because they might be passed the result of fgetc() et al., and so on
The type of a character constant. Its ; its value is that of a plain char converted to int

Note that the two different ways a character is used as an int (as an unsigned char + EOF or as a plain char converted to int) can lead to confusion. For example, isspace('\200') results in undefined behavior when char is signed.

`unsigned char`

Used internally for string comparison functions , even though these functions operate on character data. Therefore; consequently, the result of a string comparison does not depend on whether plain char is signed.
Used for situations where when the object being manipulated might be of any type, and it is necessary to access all bits of that object, as with fwrite()

Unlike other integer types, unsigned char has the unique property that

values stored in . . . objects of type unsigned char shall be represented using a pure binary notation (C Standard, subclause 6.2.6.

...

1 [ISO/IEC 9899:2011])

where a pure binary notation is defined as the following:

A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral powers of 2, except perhaps the bit with the highest position. A byte contains CHAR_BIT bits, and the values of type unsigned char range from 0 to 2 ^CHAR_BIT − 1. (subclause 6.2.6, footnote 49)

That is, objects of type unsigned char may have no padding bits and consequently no trap representation. As a result, non-bit-field objects of any type may be copied into an array of unsigned char (for example, via memcpy()) and have their representation examined one byte at a time.

`wchar_t`

Wide characters are used for natural-language character data

...

Risk Assessment

Understanding how to represent characters and character strings can eliminate many common programming errors that lead to software vulnerabilities.

Recommendation	Severity	Likelihood	Remediation Cost	Priority	Level
STR00-

A

C

medium

Medium

probable

Probable

low

Low

P12

L1

Automated Detection

Tool

Version

Checker

Description

Astrée

Include Page

	Astrée_V
	Astrée_V

Supported indirectly via MISRA C:2004 rule 6.1 and MISRA C:2012 rule 10.1.

CodeSonar

Include Page

	CodeSonar_V
	CodeSonar_V

MISC.NEGCHAR

Negative Character Value

LDRA tool suite

Include Page

	LDRA_V
	LDRA_V

329 S, 432 S

Fully implemented

Parasoft C/C++test

Include Page

	Parasoft_V
	Parasoft_V

CERT_C-STR00-a

The plain char type shall be used only for the storage and use of character values

RuleChecker

Include Page

	RuleChecker_V
	RuleChecker_V

Supported indirectly via MISRA C:2004 rule 6.1 and MISRA C:2012 rule 10.1.

SonarQube C/C++ Plugin

Include Page

	SonarQube C/C++ Plugin_V
	SonarQube C/C++ Plugin_V

S810

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

References

Wiki Markup

\[[ISO/IEC TR 24731-1-2007|AA. C References#ISO/IEC TR 24731-1-2007]\]
\[[ISO/IEC 9899-1999|AA. C References#ISO/IEC 9899-1999]\] Section 7.21, "String handling <string.h>"
\[[Seacord 05a|AA. C References#Seacord 05a]\] Chapter 2, "Strings"
\[[Seacord 05b|AA. C References#Seacord 05b]\]

Related Guidelines

SEI CERT C++ Coding Standard

VOID STR00-CPP. Represent characters using an appropriate type

Bibliography

[ISO/IEC 9899:2011]	Subclause 6.2.6, "Representations of Types"
[Seacord 2013]	Chapter 2, "Strings"

...

Image Added Image Added Image Added07. Characters and Strings (STR) 07. Characters and Strings (STR) STR01-A. Use managed strings for development of new string manipulation code

Space shortcuts

Page tree

Versions Compared

Old Version 1

New Version Current

Key

`signed char` and `unsigned char`

"

Plain" `char`

`int`

`unsigned char`

`wchar_t`

Risk Assessment

Automated Detection

Related Vulnerabilities

References

Related Guidelines

Bibliography

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 1

New Version Current

Key

signed char and unsigned char

"

Plain" char

int

unsigned char

wchar_t

Risk Assessment

Automated Detection

Related Vulnerabilities

References

Related Guidelines

Bibliography

`signed char` and `unsigned char`

Plain" `char`

`int`

`unsigned char`

`wchar_t`