Input Validation and Data Sanitization
By definition, security vulnerabilities are flaws in software that can be exploited by an attacker. To succeed, the attacker must provide malicious inputs to a program of influence the program's environments to trigger the vulnerability. Consequently, preventing the introduction of malicious inputs into a program can eliminate the majority of vulnerabilities; the purpose of input validation and data sanitization.
Wiki Markup |
---|
Java language programs can input data from a wide variety of course including command line arguments, the console, files, network data, environment variables, and system properties. Both environment variables and system properties provide user-defined mappings between keys and their corresponding values, and can be used to communicate those values from the environment to a process. According to the Java API \[[API 2006|AA. Bibliography#API 06]\] {{java.lang.System}} class documentation |
Environment variables have a more global effect because they are visible to all descendants of the process which defines them, not just the immediate Java subprocess. They can have subtly different semantics, such as case insensitivity, on different operating systems. For these reasons, environment variables are more likely to have unintended side effects. It is best to use system properties where possible. Environment variables should be used when a global effect is desired, or when an external system interface requires an environment variable (such as
PATH
).
When programs execute in a more trusted domain than their environment, the program must assume that the values of environment variables are untrusted and must sanitize and validate any environment values before use.
The default values of system properties are set by the JVM upon startup and can be considered trusted. However, they may be overridden by properties from untrusted sources, such as a configuration file. Properties from untrusted sources must be sanitized and validated before use.
Wiki Markup |
---|
The following figure (adapted from \[[Tutorials 2008|AA. Bibliography#Tutorials 08]\]) shows this behavior. |
The Myth of Trust
Regardless of programming language, input validation requires several steps. The following steps are paraphrased from _Secure Coding in C and C++_ \[[Seacord 2005|AA. Bibliography#Seacord 05]\]: |
- All input sources must be identified. Input sources include the console, command line arguments, files, network data, environment variables, and system properties.
- Specify and validate data. Data from untrusted sources must be fully specified and the data validated against these specifications. The system implementation must be designed to handle any range or combination of valid data. Valid data, in this sense, is data that is anticipated by the design and implementation of the system and therefore will not result in the system entering an indeterminate state. For example, if a system accepts two integers as input and multiplies those two values, the system must either (a) validate the input to ensure that an overflow or other exceptional condition cannot occur as a result of the operation or (b) be prepared to handle the result of the operation. The specifications must address limits, minimum and maximum values, minimum and maximum lengths, valid content, initialization and reinitialization requirements, and encryption requirements for storage and transmission.
- Ensure that all input meets specification. Input should be validated as soon as possible. Incorrect input is not always maliciousâ”often it is accidental. Reporting the error as soon as possible often helps correct the problem. When an exception occurs deep in the code it is not always apparent that the cause was an invalid input and which input was out of bounds. A data dictionary or similar mechanism can be used for specification of all program inputs. Input is usually stored in variables, and some input is eventually stored as persistent data. To validate input, specifications for what is valid input must be developed. A good practice is to define data and variable specifications, not just for all variables that hold user input, but also for all variables that hold data from a persistent store. The need to validate user input is obvious; the need to validate data being read from a persistent store is a defense against the possibility that the persistent store has been tampered with.
The Myth of Trust
Wiki Markup |
---|
Software programs often contain multiple components that act as subsystems, where each component operates in one or more trusted domains. For example, one component may have access to the file system but lack access to the network, while another component has access to the network but lacks access to the file system. _Distrustful decomposition_ and _privilege separation_ \[[Dougherty 2009|AA. Bibliography#Dougherty 2009]\] are examples of secure design patterns that recommend reducing the amount of code that runs with special privileges by designing the system using mutually untrusting components. |
...
Third-party code should operate in its own trusted domain; any code potentially exported to a third-party — such as libraries — should be deployable in well-defined trusted domains. The public API of the potentially-exported code can be considered to be a trust boundary. Data flowing across a trust boundary should be validated when the publisher lacks guarantees of validation. A subscriber or client may omit validation when the data flowing into its trust boundary is appropriate for use as is. In all other cases, inbound data must be validated.Injection Attacks
Injection Attacks
Data received by a component from a source outside the component's trust boundary may be malicious. Consequently, the program must take steps to ensure that the data are both genuine and appropriate.
...