Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

A secure system is invariably subject to stresses, such as those caused by attack, erroneous or malicious inputs, hardware or software faults, unanticipated user behavior, and unexpected environmental changes that are outside the bounds of "normal operation." Yet , the system must continue to deliver essential services in a timely manner, safely and securely. To accomplish this, the system must exhibit qualities such as robustness, reliability, error tolerance, fault tolerance, performance, and security. All of these system-quality attributes depend on consistent and comprehensive error handling that supports the goals of the overall system.

Wiki Markup
ISO/IEC PDTRTR 24772, Section 6.47, "REU Termination strategy" \[[ISO/IEC PDTRTR 24772|AA. Bibliography#ISO/IEC PDTRTR 24772]\], says

Wiki Markup
Expectations that a system will be dependable are based on the confidence that the system will operate as expected and not fail in normal use. The dependability of a system and its fault tolerance can be measured through the component part's reliability, [availability|BB. Definitions#availability], safety and security. Reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time \[[IEEE Std 610.12 1990|AA. Bibliography#IEEE Std 610.12 1990]\]. Availability is how timely and reliable the system is to its intended users. Both of these factors matter highly in systems used for safety and security. In spite of the best intentions, systems will encounter a failure, either from internally poorly written software or external forces such as power outages/variations, floods, or other natural disasters. The reaction to a fault can affect the performance of a system and in particular, the safety and security of the system and its users.

Wiki Markup
Effective error handling (which includes error reporting, report aggregation, analysis, response, and recovery) is a central aspect of the design, implementation, maintenance, and operation of systems that exhibit survivability under stress.  Survivability is the capability of a system to fulfill its mission, in a timely manner, despite an attack, accident, or other stress that is outside the bounds of normal operation \[[Lipson 002000|AA. Bibliography#Lipson 00]\].  If full services can't be maintained under a given stress, survivable systems degrade gracefully, continue to deliver essential services, and recover full services as conditions permit.

Wiki Markup
Error reporting and error handling play a central role in the engineering and operation of survivable systems.  Survivability is an emergent property of a system as a whole \[[Fisher 991999|AA. Bibliography#Fisher 99]\] and depends on the behavior of all of the system's components and the interactions among them.  From the viewpoint of error handling, every system component, down to the smallest routine, can be considered to be a sensor capable of reporting on some aspect of the health of the system.  Any error (i.e.,, or anomaly), ignored, or improperly handled, can threaten delivery of essential system services and, as a result, put at risk the organizational or business mission that the system supports.

...

An error-handling policy must specify a comprehensive approach to error reporting and response. Components and routines should always generate status indicators, and all called routines should have their error returns checked, and all . All input should be checked for compliance with the formal requirements for such input rather than blindly trusting input data. Moreover, never assume, based on specific knowledge about the system or its domain, that the success of a called routine is guaranteed. The failure to report or properly respond to errors or other anomalies from a system perspective can threaten the survivability of the system as a whole.

Wiki Markup
ISO/IEC PDTRTR 24772, Section 6.47, "REU Termination strategy"  \[[ISO/IEC PDTRTR 24772|AA. Bibliography#ISO/IEC PDTRTR 24772]\], describes the following mitigation strategies:

Software developers can avoid the vulnerability or mitigate its ill effects in the following ways:

  • A strategy for fault handling should be decided. Consistency in fault handling should be the same with respect to critically similar parts.
  • A multitiered multi-tiered approach of fault prevention, fault detection, and fault reaction should be used.
  • System-defined components that assist in uniformity of fault handling should be used when available. For one example, designing a "runtime constraint handler" (as described in ISO/IEC TR 24731-1) permits the application to intercept various erroneous situations and perform one consistent response, such as flushing a previous transaction and restarting at the next one.
  • When there are multiple tasks, a fault-handling policy should be specified whereby a task may
    • halt, and keep its resources available for other tasks (perhaps permitting restarting of the faulting task)
    • halt, and remove its resources (perhaps to allow other tasks to use the resources so freed, or to allow a recreation of the task)
    • halt, and signal the rest of the program to likewise halt

...

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

Other Languages

Related Guidelines

CERT This rule appears in the C++ Secure Coding Standard as : ERR00-CPP. Adopt and implement a consistent and comprehensive error-handling policy

ISO/IEC 9899:1999 Sections 7.1.4, 7.9.10.4, and 7.11.6.2

ISO/IEC PDTR 24772 "REU Termination strategy" and "NZN Returning error status"

MISRA Rule 16.1

MITRE CWE: CWE-391, "Unchecked Error Condition"

MITRE CWE: CWE-544, "Missing Error Handling Mechanism"

Bibliography

Wiki Markup
\[[Fisher 991999|AA. Bibliography#Fisher 99]\]
\[[Horton 901990|AA. Bibliography#Horton 90]\] Section 11, p. 168, and Section 14, p. 254
\[[ISO/IEC 9899:1999|AA. Bibliography#ISO/IEC 9899-1999]\] Sections 7.1.4, 7.9.10.4, and 7.11.6.2
\[[ISO/IEC PDTR 24772|AA. Bibliography#ISO/IEC PDTR 24772]\] "REU Termination strategy" and "NZN Returning error status"
\[[Koenig 89Koenig 1989|AA. Bibliography#Koenig 89]\] Section 5.4, p. 73
\[[Lipson 002000|AA. Bibliography#Lipson 00]\]
\[[Lipson 062006|AA. Bibliography#Lipson 06]\]
\[[MISRA 04|AA. Bibliography#MISRA 04]\] Rule 16.1
\[[MITRE 07|AA. Bibliography#MITRE 07]\] [CWE ID 391|http://cwe.mitre.org/data/definitions/391.html], "Unchecked Error Condition," [CWE ID 544|http://cwe.mitre.org/data/definitions/544.html], "Missing Error Handling Mechanism"
\[[Summit 052005|AA. Bibliography#Summit 05]\] C-FAQ Question 20.4

...