...
Use of String.intern()
should be reserved for cases in which the tokenization of strings either yields an important performance enhancement or dramatically simplifies code. Performance Examples include programs engaged in natural language processing and compiler-like tools that tokenize program input. For most other programs, performance and readability are often improved by the use of code that applies the Object.equals()
approach and that lacks any dependence on reference equality.
Performance issues can arise because the Java Language Specification lacks provides few guarantees about the implementation of String.intern()
. For example:
- The cost of
String.intern()
grows as the number of intern strings grows. Performance should be no worse than n log n, but the Java Language Specification lacks a specific performance guarantee. - Interned strings become In early JVM implementations, interned strings became immortal: they cannot be were exempt from garbage-collectedcollection. This can be problematic when large numbers of strings are interned. More recent implementations can garbage collect the storage occupied by interned strings that are no longer referenced. However, the Java Language Specification lacks any specification of this behavior.
- In JVM implementations prior to Java 1.7, interned strings are allocated in the
permgen
storage region, which is typically much smaller than the remainder of the heap. Consequently interning large numbers of strings can lead to an out of memory condition. In many Java 1.7 implementations, interned strings are allocated on the heap, thus relieving this restriction. Once again, the details of allocation are unspecified by the Java Language Specification; consequently implementations may vary.
When canonicalization of objects is required, it may be wiser to use a custom canonicalizer built on top of ConcurrentHashMap; see Bloch asdf for details.
Applicability
Using reference equality to compare objects can lead to unexpected results.
...