Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Secure systems are invariably subject to stresses (for example, such as those caused by attack, erroneous or malicious inputs, hardware failuresor software faults, unanticipated user behavior, or unexpected environmental changes) that are outside the bounds of "normal operation," and yet the system must continue to deliver essential services in a timely manner, safely and securely. To accomplish this, a system must exhibit system qualities such as robustness, reliability, error tolerance, fault tolerance, performance, and security. All of these system quality attributes depend upon a consistent and comprehensive error-handling that supports the goals of the overall system.

...

Recognition of the full nature of adverse events and the determination of appropriate measures for recovery and response are often not possible in the context of the component or routine in which a related error first manifests itself. Aggregation of multiple error reports and the interpretation of those reports in a higher context may be required both to understand what is happening and to decide on the appropriate action to take. Of course the domain-specific context in which the system operates plays a huge role in determining proper recovery strategies and tactics. For safety-critical systems, simply halting the system (or even just terminating an offending process) in response to an error is rarely the best course of action and may lead to disaster. From a system perspective, error handling strategies should map directly into survivability strategies, which may include recovery by activating fully redundant backup services, or by providing alternate sets of roughly equivalent services which fulfill the mission with sufficient diversity to greatly improve the odds of survival against common mode failures.

An error handling policy must specify a comprehensive approach to error reporting and response. Components and routines should always generate status indicators, all called routines should have their error returns checked, and all input should be checked for compliance with the formal requirements for such input rather than blindly trusting input data. Moreover, never assume, based on specific knowledge about the system or its domain, that the success of a called routine is guaranteed. The failure to report or properly respond to errors or other anomalies from a system perspective can threaten the survivability of the system as a whole.

ISO/IEC PDTR 24772 Section 6.47, "REU Termination strategy" says:

...