Software vulnerability reports and reports of software exploitations continue to grow at an alarming rate, and a significant number of these reports result in technical security alerts. To address this growing threat to the government, corporations, educational institutions, and individuals, systems must be developed that are free of software vulnerabilities.
Coding errors cause the majority of software vulnerabilities. For example, 64 percent of the nearly 2,500 vulnerabilities in the National Vulnerability Database in 2004 were caused by programming errors \[[Heffley 2004|AA. Bibliography#Heffley 2004]\].
Java is a relatively secure language: there is no explicit pointer manipulation; array and string bounds are automatically checked; attempts at referencing a null pointer are trapped; the arithmetic operations are well defined and platform independent, as are the type conversions. The built-in bytecode verifier ensures that these checks are always in place.
Moreover, there are comprehensive, fine-grained security mechanisms available in Java that can control access to individual files, sockets, and other sensitive resources. To take advantage of the security mechanisms, the Java Virtual Machine (JVM) must have a
security manager in place. This is an ordinary Java object of class {{java.lang.SecurityManager}} (or a subclass) that can be put in place programmatically but is more usually specified via a command line parameter.
There are, however, ways in which Java program safety can be compromised. The remainder of this chapter describes misuse cases under which Java programs might be exploited, and examples of guidelines which mitigate against these attacks. Not all of the rules apply to all Java language programs; frequently their applicability depend upon how the software is deployed and your assumptions concerning trust.
h1. Input Validation and Data Sanitization
Java language programs can input data from a wide variety of course including command line arguments, the console, files, network data, environment variables, and system properties. Both environment variables and system properties provide user-defined mappings between keys and their corresponding values, and can be used to communicate those values from the environment to a process. According to the Java API \[[API 2006|AA. Bibliography#API 06]\] {{java.lang.System}} class documentation
{quote}
Environment variables have a more global effect because they are visible to all descendants of the process which defines them, not just the immediate Java subprocess. They can have subtly different semantics, such as case insensitivity, on different operating systems. For these reasons, environment variables are more likely to have unintended side effects. It is best to use system properties where possible. Environment variables should be used when a global effect is desired, or when an external system interface requires an environment variable (such as {{PATH}}).
{quote}
When programs execute in a more trusted domain than their environment, the program must assume that the values of environment variables are untrusted and must sanitize and validate any environment values before use.
The default values of system properties are set by the JVM upon startup and can be considered trusted. However, they may be overridden by properties from untrusted sources, such as a configuration file. Properties from untrusted sources must be sanitized and validated before use.
The following figure (adapted from \[[Tutorials 2008|AA. Bibliography#Tutorials 08]\]) shows this behavior.
!http://java.sun.com/docs/books/tutorial/figures/essential/environment-1loads.jpg|align=center!
h1. The Myth of Trust
Software programs often contain multiple components that act as subsystems, where each component operates in one or more trusted domains. For example, one component may have access to the file system but lack access to the network, while another component has access to the network but lacks access to the file system. _Distrustful decomposition_ and _privilege separation_ \[[Dougherty 2009|AA. Bibliography#Dougherty 2009]\] are examples of secure design patterns that recommend reducing the amount of code that runs with special privileges by designing the system using mutually untrusting components.
When components with differing degrees of trust share data, the data are said to flow across a trust boundary. Because Java allows components under different trusted domains to communicate with each other, data can be transmitted across a trust boundary. Furthermore, a Java program can contain both internally developed and third-party code. Data that are transmitted to or accepted from third-party code also flow across a trust boundary.
While software components can obey policies that allow them to transmit data across trust boundaries, they cannot specify the level of trust given to any component. The deployer of the application must define the trust boundaries with the help of a system-wide security policy. A security auditor can use that definition to determine whether the software adequately supports the security objectives of the application.
Third-party code should operate in its own trusted domain; any code potentially exported to a third-party --- such as libraries --- should be deployable in well-defined trusted domains. The public API of the potentially-exported code can be considered to be a trust boundary. Data flowing across a trust boundary should be validated when the publisher lacks guarantees of validation. A subscriber or client may omit validation when the data flowing into its trust boundary is appropriate for use as is. In all other cases, inbound data must be validated.
h1. Injection Attacks
Data received by a component from a source outside the component's trust boundary may be malicious. Consequently, the program must take steps to ensure that the data are both genuine and appropriate.
!Injection.jpg!
These steps can include the following:
*Validation*: Validation is the process of ensuring that input data fall within the expected domain of valid program input. For example, not only must method arguments conform to the type and numeric range requirements of a method or subsystem, but also they must contain data that conform to the required input invariants for that method.
*Sanitization*: In many cases, the data may be passed directly to a component in a different trusted domain. Data sanitization is the process of ensuring that data conforms to the requirements of the subsystem to which they are passed. Sanitization also involves ensuring that data also conforms to security-related requirements regarding leaking or exposure of sensitive data when output across a trust boundary. Sanitization may include the elimination of unwanted characters from the input by means of removal, replacement, encoding or escaping the characters. Sanitization may occur following input (input sanitize) or before the data is passed to across a trust boundary (output sanitization). Data sanitization and input validation may coexist and complement each other. Refer to the related guideline [IDS01-J. Sanitize data passed across a trust boundary|IDS00-J. Sanitize untrusted data passed across a trust boundary] for more details on data sanitization.
*Canonicalization* and *Normalization*: Canonicalization is the process of lossless reduction of the input to its equivalent simplest known form. Normalization is the process of lossy conversion of input data to the simplest known (and anticipated) form. Canonicalization and normalization must occur _before_ validation to prevent attackers from exploiting the validation routine to strip away illegal characters and thus constructing a forbidden (and potentially malicious) character sequence. Refer to the guideline [IDS02-J. Normalize strings before validating them|IDS01-J. Normalize strings before validating them] for more details. In addition, ensure that normalization is performed only on fully assembled user input. Never normalize partial input or combine normalized input with non-normalized input.
For example, POSIX file systems provide a syntax for expressing file names on the system using paths. A path is a string which indicates how to find any file by starting at a particular directory (usually the current working directory), and traversing down directories until the file is found. Canonical paths lack both symbolic links and special entries such as '.' or '..', which are handled specially on POSIX systems. Each file accessible from a directory has exactly one canonical path, along with many non-canonical paths.
In particular, complex subsystems are often components that accept string data that specifies commands or instructions to a the component. String data passed to these components may contain special characters that can trigger commands or actions, resulting in a software [vulnerability|BB. Definitions#vulnerability].
Examples of components which can interpret commands or instructions:
* Operating system command interpreter (see guideline [IDS07-J. Do not pass untrusted, unsanitized data to the Runtime.exec() method])
* A data repository with an SQL-compliant interface
* XML parser
* XPath evaluators
* A SAX (Simple API for XML) or a DOM (Document Object Model) parser
* Lightweight Directory Access Protocol (LDAP) directory service
* Script engines
Many rules address proper filtering of untrusted input, especially when such input is passed to a component that can interpret commands or instructions.
When data must be sent to a component in a different trusted domain, the sender must ensure that the data is suitable for the receiver's trust boundary by properly encoding and escaping any data flowing across the trust boundary. For example, if a system is infiltrated by malicious code or data, many attacks are rendered ineffective if the system's output is appropriately escaped and encoded.
h1. Capabilities
A capability is a communicable, unforgeable token of authority. It refers to a value that references an object along with an associated set of access rights. A user program on a capability-based operating system must use a capability to access an object \[Wikipedia 2011\].
The term capability was introduced by Dennis and Van Horn \[[Dennis 1966|AA. Bibliography#Dennis 1966]\]. The basic idea is that for a program to access an object it must have a special token. This token designates an object and gives the program the authority to perform a specific set of actions (such as reading or writing) on that object. Such a token is known as a capability.
In an object-capability language, all program state is contained in objects that cannot be read or written without a reference, which serves as an unforgeable capability. All external resources are also represented as objects. Objects encapsulate their internal state, providing reference holders access only through prescribed interfaces \[[Mettler 2010A|AA. Bibliography#Mettler 2010A]\].
Because of Javaâs {{==}} operator, which tests pointer equality, every object has an unforgeable identity in addition to its contents. Identity tests mean that any object can be used as a token, serving as an unforgeable proof of authorization to perform some action \[[Mettler 2010B|AA. Bibliography#Mettler 2010B]\].
Authority is embodied by object references, which serve as capabilities. Authority refers to any effects that running code can have other than to perform side-effect-free computations. Authority includes not only effects on external resources such as files or network sockets, but also on mutable data structures that are shared with other parts of the program \[[Mettler 2010B|AA. Bibliography#Mettler 2010B]\].
Rules that involve capabilities include:
{contentbylabel:label=+capability,-void|maxResults=99|showLabels=false|showSpace=false|sort=title|space=@self}
h1. Leaking Sensitive Data
A system's security policy determines which information is _sensitive_. Sensitive data may include user information such as social security or credit card numbers, passwords, or private keys.
!filter_output.jpg!
Java software components provide many opportunities to output sensitive information. Rules that address the mitigation of sensitive information disclosure include:
{contentbylabel:label=+sensitive,-void|maxResults=99|showLabels=false|showSpace=false|sort=title|space=@self}
h1. Resource Exhaustion
Denial of service can occur when resource usage is disproportionately large in comparison to the input data that causes the resource usage.
This guideline is of greater concern for persistent, server-type systems than for desktop applications. Checking inputs for excessive resource consumption may be unjustified for client software that expects the user to handle resource-related problems. Even for client software, however, should check for inputs that could cause persistent denial of service, such as filling up the file system.
The _Secure Coding Guidelines for the Java Programming Language_ [SCG 2009|AA. Bibliography#SCG 09] lists some examples of possible attacks:
* Requesting a large image size for vector graphics, for instance, SVG and font files.
* "Zip bombs" whereby a short file is very highly compressed, for instance, ZIPs, GIFs and gzip encoded HTTP content.
* "Billion laughs attack" whereby XML entity expansion causes an XML document to grow dramatically during parsing. Set the XMLConstants.FEATURE_SECURE_PROCESSING feature to enforce reasonable limits.
* Using excessive disc space.
* Inserting many keys with the same hash code into a hash table, consequently triggering worst-case performance (O(n ^2^)) rather than typical-case performance (O\(n)).
* Initiating many connections where the server allocates significant resources for each, for instance, the traditional "SYN flood" attack.
Rules for preventing denial of service attacks resulting from resource exhaustion include:
{contentbylabel:label=+resource-exhaustion,-void|maxResults=99|showLabels=false|showSpace=false|sort=title|space=@self}
h1. Type Safety
Java is believed to be a type-safe language [LSOD 02, Sec. 5.1]. For that reason, it should not be
possible to compromise a Java program by misusing the type system. To see why type safety
is so important, consider the following types:
{code}
public class TowerOfLondon {
private Treasure theCrownJewels;
...
}
public class GarageSale {
public Treasure myCostumeJewerly;
...
}
{code}
If these two types could be confused, it would be possible to access the private field {{theCrownJewels}} as if it were the public field {{myCostumeJewerly}}. More generally, a _type confusion attack_ could allow Java security to be compromised by making the internals of the security manager open to abuse. A team of researchers at Princeton University showed that any type confusion in Java could be used to completely overcome Javaâs security mechanisms (see Securing Java Ch. 5, Sec. 7 [McGraw 99]).
Javaâs type safety means that fields that are declared private or protected or that have default (package) protection should not be globally accessible. However, there are a number of vulnerabilities âbuilt inâ to Java that enable this protection to be overcome. These should come as no surprise to the Java expert, as they are well documented, but they may trap the unwary.
h1.Public Fields
A field that is declared public may be directly accessed by any part of a Java program and may be modified from anywhere in a Java program (unless the field is declared final). Clearly, sensitive information must not be stored in a public field, as it could be
compromised by anyone who could access the JVM running the program.
h1. Inner Classes
Inner classes have access to all the fields of their surrounding class. There is no bytecode support for inner classes, so they are compiled into ordinary classes with names like OuterClass$InnerClass. So that the inner class can access the private fields of the
outer class, the private access is changed to package access in the bytecode. For that reason, handcrafted bytecode can access these private fields (see âSecurity Aspects in Java Bytecode Engineeringâ [Schönefeld 02] for an example).
h1. Serialization
Serialization enables the state of a Java program to be captured and written out to a byte stream. This allows for the state to be preserved so that it can be reinstated (by deserialization). Serialization also allows for Java method calls to be transmitted over a network for Remote Method Invocation (RMI). An object (called someObject below) can be serialized as follows:
{code}
ObjectOutputStream oos = new ObjectOutputStream (
new FileOutputStream (âSerialOutputâ) );
oos.writeObject (someObject);
oos.flush ( );
{code}
The object can be deserialized as follows:
{code}
ObjectInputStream ois = new ObjectInputStream (
new FileInputStream (âSerialOutputâ) );
someObject = (SomeClass)ois.readObject ( );
{code}
Serialization captures all the fields of a class, provided the class implements the {{Serializable}} interface, including the non-public fields that are not normally accessible (unless the field is declared transient). If the byte stream to which the serialized values are written is readable, then the values of the normally inaccessible fields may be read. Moreover, it may be possible to modify or forge the preserved values so that when the class is deserialized, the values become corrupted.
Introducing a security manager does not prevent the normally inaccessible fields from being serialized and deserialized (although permission must be granted to write to and read from the file or network if the byte stream is being stored or transmitted). Network traffic (including RMI) can be protected, however, by using SSL.
h1. Reflection
Reflection enables a Java program to analyze and modify itself. In particular, a program can find out the values of field variables and change them [Forman 05, Sun 02]. The Java reflection API includes a method call that enables fields that are not normally accessible to be accessed under reflection. The following code prints out the names and values of all fields of an object {{someObject}} of class SomeClass:
{code}
Field [ ] fields = SomeClass.getDeclaredFields( );
for (Field fieldsI : fields) {
if ( !Modifier.isPublic (fieldsI.getModifiers( )) ) {
fieldsI.setAccessible (true);
}
System.out.print (âField: â + fieldsI.getName( ));
System.out.println (â, value: â + fieldsI.get (someObject));
}
{code}
A field could be set to a new value as follows:
{code}
String newValue = reader.readLine ( );
fieldsI.set (someObject,
returnValue (newValue, fieldsI.getType ( )) );
{code}
Introducing the default security manager does prevent the fields that would not normally be accessible from being accessed under reflection. The default security manager throws {{java.security.AccessControlException}} in these circumstances. However, it is
possible to grant a permission to override this default behavior: {{java.lang.reflect.ReflectPermission}} can be granted with action {{suppressAccessChecks}}.
h1.The JVM Tool Interface
Java 5 introduced the JVM Tool Interface (JVMTI), replacing both the JVM Profiler Interface (JVMPI) and the JVM Debug Interface (JVMDI), which are now deprecated.
The JVMTI contains extensive facilities to find out about the internals of a running JVM, including facilities to monitor and modify a running Java program. These facilities are rather low level and require the use of the Java Native Interface (JNI) and C Language
programming. However, they provide the opportunity to access fields that would not normally be accessible. Also, there are facilities that can change the behavior of a running Java program (for example, threads can be suspended or stopped).
The JVMTI works by using agents that communicate with the running JVM. These agents must be loaded at JVM startup and are usually specified via one of the command line options {{âagentlib:}} or {{âagentpath:}}. However, agents can be specified in environment
variables, although this feature can be disabled where security is a concern. The JVMTI is always enabled, and JVMTI agents may run under the default security manager without requiring any permissions to be granted. More work needs to be done to determine under
exactly what circumstances the JVMTI can be misused.
h1. Debugging
The Java Platform Debugger Architecture (JPDA) builds on the JVMTI and provides highlevel facilities for debugging running Java systems. These include facilities similar to the reflection facilities described above for inspecting and modifying field values. In
particular, there are methods to get and set field and array values. Access control is not enforced so, for example, even the values of private fields can be set.
Introducing the default security manager means that various permissions must be granted in order for debugging to take place. The following policy file was used to run the JPDS Trace demonstration under the default security manager:
{code}
grant {
permission java.io.FilePermission "traceoutput.txt", "read,write";
permission java.io.FilePermission "C:/Program Files/Java/jdk1.5.0_04/lib/tools.jar", "read";
permission java.io.FilePermission "C:/Program", "read,execute";
permission java.lang.RuntimePermission "modifyThread";
permission java.lang.RuntimePermission "modifyThreadGroup";
permission java.lang.RuntimePermission "accessClassInPackage.sun.misc";
permission java.lang.RuntimePermission "loadLibrary.dt_shmem";
permission java.util.PropertyPermission "java.home", "read";
permission java.net.SocketPermission "<localhost>", "resolve";
permission com.sun.jdi.JDIPermission "virtualMachineManager";
};
{code}
h1. Monitoring and Management
Java contains extensive facilities for monitoring and managing a JVM. In particular, the Java Management Extension (JMX) API enables the monitoring and control of class loading, thread state and stack traces, deadlock detection, memory usage, garbage
collection, operating system information, and other operations. There are also facilities for logging monitoring and management. A running JVM may be monitored and managed remotely.
For a JVM to be monitored and managed remotely, it must be started with various system properties set (either on the command line or in a configuration file). Also, there are provisions for the monitoring and management to be done securely (by passing the information using SSL, for example) and to require proper authentication of the remote server. However, users may start a JVM with remote monitoring and management enabled with no security for their own purposes, and this would leave the JVM open to compromise
from outsiders. Although a user could not easily turn on remote monitoring and management by accident, they might not realize that starting a JVM so enabled, without any security also switched on, could leave their JVM exposed to outside abuse.
h1. Concurrency, Visibility, and Memory
Memory that can be shared between threads is called _shared memory_ or _heap memory_. The term _variable_ as used in this section refers to both fields and array elements \[[JLS 05|AA. Bibliography#JLS 05]\]. Variables that are shared between threads are referred to as shared variables. All instance fields, {{static}} fields, and array elements are shared variables and are stored in heap memory. Local variables, formal method parameters, and exception handler parameters are never shared between threads and are unaffected by the [memory model|BB. Definitions#memory model].
In modern shared-memory multiprocessor architectures, each processor has one or more levels of cache that are periodically reconciled with main memory as shown in the following figure:
!cache.jpg!
The visibility of writes to shared variables can be problematic because the value of a shared variable may be cached; writing its value to main memory may be delayed. Consequently, another thread may read a stale value of the variable.
A further concern is not only that concurrent executions of code are typically interleaved, but also that statements may be reordered by the compiler or runtime system to optimize performance. This results in execution orders that are difficult to discern by examination of the source code. Failure to account for possible reorderings is a common source of [data races|BB. Definitions#data races].
Consider the following example in which {{a}} and {{b}} are (shared) global variables or instance fields, but {{r1}} and {{r2}} are local variables that are inaccessible to other threads.
Initially, let {{a = 0}} and {{b = 0}}.
|| {{Thread 1}} || {{Thread 2}} ||
| {{a = 10;}} | {{b = 20;}} |
| {{r1 = b;}} | {{r2 = a;}} |
In {{Thread 1}}, the two assignments {{a = 10;}} and {{r1 = b;}} are unrelated, so the compiler or runtime system is free to reorder them. The two assignments in {{Thread 2}} may also be freely reordered. Although it may seem counter-intuitive, the Java memory model allows a read to see the value of a write that occurs later in the apparent execution order.
A possible execution order showing actual assignments is:
|| Execution Order (Time) || Thread# || Assignment || Assigned Value || Notes ||
| 1. | _t{_}{~}1~ | {{a = 10;}} | 10 | |
| 2. | _t{_}{~}2~ | {{b = 20;}} | 20 | |
| 3. | _t{_}{~}1~ | {{r1 = b;}} | 0 | Reads initial value of {{b}}, that is 0 |
| 4. | _t{_}{~}2~ | {{r2 = a;}} | 0 | Reads initial value of {{a}}, that is 0 |
In this ordering, {{r1}} and {{r2}} read the original values of the variables {{b}} and {{a}} respectively, even though they are expected to see the updated values, 20 and 10. Another possible execution order showing actual assignments is:
|| Execution Order (Time) || Thread# || Statement || Assigned Value || Notes ||
| 1. | _t{_}{~}1~ | {{r1 = b;}} | 20 | Reads later value (in step 4.) of write, that is 20 |
| 2. | _t{_}{~}2~ | {{r2 = a;}} | 10 | Reads later value (in step 3.) of write, that is 10 |
| 3. | _t{_}{~}1~ | {{a = 10;}} | 10 | |
| 4. | _t{_}{~}2~ | {{b = 20;}} | 20 | |
In this ordering, {{r1}} and {{r2}} read the values of {{a}} and {{b}} written from step 3 and 4, even before the statements corresponding to these steps have executed.
Restricting the set of possible reorderings makes it easier to reason about the correctness of the code.
Even when statements execute in the order of their appearance in a thread, caching can prevent the latest values from being reflected in the main memory.
The Java Language Specification defines the Java Memory Model (JMM), which provides certain guarantees to the Java programmer. The JMM is specified in terms of actions, including variable reads and writes, monitor locks and unlocks, and thread starts and joins. The JMM defines a partial ordering called [happens-before|BB. Definitions#happens-before order] on all actions within the program. To guarantee that a thread executing action B can see the results of action A, for example, there must be a happens-before relationship defined such that A happens-before B.
According to section 17.4.5 "Happens-before Order" of the Java Language Specification \[[JLS 05|AA. Bibliography#JLS 05]\]:
{quote}
# An unlock on a monitor happens-before every subsequent lock on that monitor.
# A write to a volatile field happens-before every subsequent read of that field.
# A call to {{start()}} on a thread happens-before any actions in the started thread.
# All actions in a thread happen-before any other thread successfully returns from a {{join()}} on that thread.
# The default initialization of any object happens-before any other actions (other than default-writes) of a program.
# A thread calling interrupt on another thread happens-before the interrupted thread detects the interrupt
# The end of a constructor for an object happens-before the start of the finalizer for that object
{quote}
When two operations lack a happens-before relationship, the Java Virtual Machine (JVM) is free to reorder them. A [data race|BB. Definitions#data race] occurs when a variable is written to by at least one thread and read by at least another thread, and the reads and writes lack a happens-before relationship. A correctly synchronized program is one that lacks data races. The Java Memory Model (JMM) guarantees _sequential consistency_ for correctly synchronized programs. Sequential consistency means that the result of any execution is the same as if the reads and writes on shared data by all threads were executed in some sequential order, and the operations of each individual thread appear in this sequence in the order specified by its program {mc} shall we say program order in brackets? {mc} \[[Tanenbaum 03|AA. Bibliography#Tanenbaum 03]\]. In other words:
# Take the read and write operations performed by each thread and put them in the order the thread executes them (thread order)
# Interleave the operations in some way allowed by the happens-before relationships to form an execution order
# Read operations must return most recently written data in the total [program order|BB. Definitions#program order] for the execution to be sequentially consistent
# Implies all threads see the same total ordering of reads and writes of shared variables
The actual execution order of instructions and memory accesses can vary as long as the actions of the thread appear to that thread _as if_ [program order|BB. Definitions#program order] were followed, and provided all values read are allowed for by the memory model. This allows the programmer to understand the semantics of the programs they write, and allows compiler writers and virtual machine implementors to perform various optimizations \[[JPL 06|AA. Bibliography#JPL 06]\].
There are several concurrency primitives that can help a programmer reason about the semantics of multithreaded programs.
h3. The {{volatile}} Keyword
Declaring shared variables as volatile ensures visibility and limits reordering of accesses. Volatile accesses lack a guarantee of the atomicity of composite operations such as incrementing a variable. Consequently, use of {{volatile}} is insufficient for cases where the atomicity of composite operations must be guaranteed (see [CON02-J. Ensure that compound operations on shared variables are atomic|VNA02-J. Ensure that compound operations on shared variables are atomic] for more information).
Declaring variables as volatile establishes a happens-before relationship such that a write to a volatile variable is always seen by threads performing subsequent reads of the same variable. Statements that occur before the write to the volatile field also happen-before any reads of the volatile field.
Consider two threads that are executing some statements:
!happens-before.jpg!
Thread 1 and Thread 2 have a happens-before relationship such that Thread 2 cannot start before Thread 1 finishes. {mc} Seems to be wrong ~DM => This is established by the semantics of volatile accesses. {mc}
In this example, Statement 3 writes to a volatile variable, and statement 4 (in Thread 2) reads the same volatile variable. The read sees the most recent write (to the same variable {{v}}) from statement 3.
Volatile read and write operations cannot be reordered either with respect to each other or with respect to non-volatile variable accesses. When Thread 2 reads the volatile variable, it sees the results of all the writes occurring before the write to the volatile variable in Thread 1. Because of the relatively strong guarantees of volatile, the performance overhead of volatile is almost the same as that of synchronization. {mc} last sentence needs citation; appears to be slightly risky ~DM {mc}
The previous example lacks a guarantee that statements 1 and 2 will be executed in the order in which they appear in the program. They may be freely reordered by the compiler because of the absence of a happens-before relationship between these two statements.
The possible reorderings between volatile and non-volatile variables are summarized in the matrix shown below. Load and store operations are synonymous with read and write operations, respectively. \[[Lea 08|AA. Bibliography#Lea 08]\]
{mc} Might as well rename in the table ~DM {mc}
!can_reorder.jpg!
h3. Synchronization
A correctly synchronized program is one whose sequentially consistent executions lack data races. The example shown below uses a non-volatile variable {{x}} and a volatile variable {{y}}. It is incorrectly synchronized.
|| Thread 1 || Thread 2 ||
| x = 1 | r1 = y |
| y = 2 | r2 = x |
There are two sequentially consistent execution orders of this example:
|| Step (Time) || Thread# || Statement || Comment ||
| 1. | _t{_}{~}1~ | x = 1 | Write to non-volatile variable |
| 2. | _t{_}{~}1~ | y = 2 | Write to volatile variable |
| 3. | _t{_}{~}2~ | r1 = y | Read of volatile variable |
| 4. | _t{_}{~}2~ | r2 = x | Read of non-volatile variable |
and,
|| Step (Time) || Thread# || Statement || Comment ||
| 1. | _t{_}{~}2~ | r1 = y | Read of volatile variable |
| 2. | _t{_}{~}2~ | r2 = x | Read of non-volatile variable |
| 3. | _t{_}{~}1~ | x = 1 | Write to non-volatile variable |
| 4. | _t{_}{~}1~ | y = 2 | Write to volatile variable |
In the first case, there is a happen-before relationship between actions such that steps 1 and 2 always occur before steps 3 and 4. However, the second case lacks a happens-before relationship between any of the steps. Consequently, because there is a sequentially consistent execution that lacks a happens-before relationship, this example contains data races.
Correct visibility guarantees that multiple threads accessing shared data can view each others' results, but fails to establish the order in which each thread reads or writes the data. Correct synchronization both provides correct visibility and also guarantees that threads access data in a proper order. For example, the code shown below ensures that there is only one sequentially consistent execution order that performs all the actions of thread 1 before thread 2.
{code}
class Assign {
public synchronized void doSomething() {
// Perform Thread 1 actions
x = 1;
y = 2;
// Perform Thread 2 actions
r1 = y;
r2 = x;
}
}
{code}
When using synchronization, it is unnecessary to declare the variable {{y}} as {{volatile}}. Synchronization involves acquiring a lock, performing operations, and then releasing the lock. In the above example, the {{doSomething()}} method acquires the intrinsic lock of the class object ({{Assign}}). This example can also be written to use block synchronization:
{code}
class Assign {
public void doSomething() {
synchronized (this) {
// Perform Thread 1 actions
x = 1;
y = 2;
// Perform Thread 2 actions
r1 = y;
r2 = x;
}
}
}
{code}
The intrinsic lock used in both examples is the same.
h3. The {{java.util.concurrent}} Classes
h5. Atomic Classes
Volatile variables are useful for guaranteeing visibility. However, they are insufficient for ensuring atomicity. Synchronization fills this gap but incurs overheads of context switching and frequently causes lock contention. The atomic classes of package {{java.util.concurrent.atomic}} provide a mechanism for reducing contention in most practical environments while at the same time ensuring atomicity. According to Goetz and colleagues \[[Goetz 06|AA. Bibliography#Goetz 06]\]:
{quote}
With low to moderate contention, atomics offer better scalability; with high contention, locks offer better contention avoidance.
{quote}
The atomic classes consist of implementations that exploit the design of modern processors by exposing commonly needed functionality to the programmer. For example, the {{AtomicInteger.incrementAndGet()}} method can be used for atomically incrementing a variable. The _compare-and-swap_ instruction(s) provided by modern processors offer more fine-grained control and can be used directly by invoking high-level methods such as {{java.util.concurrent.atomic.Atomic*.compareAndSet()}} where the asterisk can be, for example, an {{Integer}}, {{Long}} or {{Boolean}}.
h5. The Executor Framework
The {{java.util.concurrent}} package provides the Executor framework which offers a mechanism for executing tasks concurrently. A task is a logical unit of work encapsulated by a class that implements {{Runnable}} or {{Callable}}. The Executor framework allows task submission to be decoupled from low level scheduling and thread management details. It provides the thread pool mechanism that allows a system to degrade gracefully when presented with more requests than the system can handle simultaneously.
The {{Executor}} interface is the core interface of the framework and is extended by the {{ExecutorService}} interface that provides facilities for thread pool termination and obtaining return values of tasks (Futures). The {{ExecutorService}} interface is further extended by the {{ScheduledExecutorService}} interface that provides a way to run tasks periodically or after some delay. The {{Executors}} class provides several factory and utility methods that are pre-configured with commonly used configurations of {{Executor}}, {{ExecutorService}} and other related interfaces. For example, the {{Executors.newFixedThreadPool()}} method returns a fixed size thread pool with an upper limit on the number of concurrently executing tasks, and maintains an unbounded queue for holding tasks while the thread pool is full. The base (actual) implementation of the thread pool is provided by the {{ThreadPoolExecutor}} class. This class can be instantiated to customize the task execution policy.
The {{java.util.concurrent}} utilities are preferred over traditional synchronization primitives such as synchronization and volatile variables because the {{java.util.concurrent}} utilities abstract the underlying details, provide a cleaner and less error-prone API, are easier to scale, and can be enforced using policies.
h5. Explicit Locking
The {{java.util.concurrent}} package provides the {{ReentrantLock}} class that has additional features that are missing from intrinsic locks. For example, the {{ReentrantLock.tryLock()}} method returns immediately when another thread is already holding the lock. Acquiring and releasing a {{ReentrantLock}} has the same semantics as acquiring and releasing an intrinsic lock. |