Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Using locale-dependent methods on locale-dependent data can produce unexpected results when the locale is unspecified. Programming language identifiers, protocol keys, and HTML tags are often specified in a particular locale, usually Locale.ENGLISH. However, a user can run the program in a different locale, which may cause the program to behave erratically. It may even be possible to bypass input filters by changing the default locale, which can alter the behavior of locale-dependent methods. For example, when a string is converted to upper case, it may be declared valid; however, changing the string back to lower case during subsequent execution may result in a blacklisted string.

Any program which invokes locale-dependent methods on untrusted data must explicitly specify the locale to use with these methods.

Noncompliant Code Example

. For this reason, any program that inspects data generated by a locale-dependent function must specify the locale used to generate that data.

Some programs use locale-dependent functions only for presenting output, such as dates. In these cases the locale-dependent data is not inspected by the program, and it may safely rely the default locale.

For example, most European languages associate the letter I as the uppercase version of i. But Turkish is an exception: it has a dotted i whose uppercase version is also dotted: İ, and an undotted ı whose uppercase version is undotted I. Changing capitalization on most strings in the Turkish locale therefore produces surprising results [API 2006].

For example, the following program:

Code Block
langjava
public class Example {
  public static void main(String[] args) {
    System.out.println( "David".toUpperCase());
  }
}

behaves as expected in an English locale:

Code Block
% java Example
DAVID
% 

but produces different results in the Turkish locale:

Code Block
% java -Duser.language=tr Example
DAVİD
% 

Noncompliant Code Example (toUpperCase())

In HTML, tags are case-insensitive, and can therefore be specified using uppercase, lowercase, or any mixture of cases. This noncompliant code example uses the locale-dependent String.toUpperCase() method to convert an HTML tag to upper case, to check it for futher processing. While the English locale would convert "title" to "TITLE", the Turkish locale will convert "title" to "TİTLE"T?TLE," where '?' is the Latin capital letter 'I' with a dot above the character [API 2006], and the check will fail.

Code Block
bgColor#ffcccc
lang#FFccccjava
public static void processTitle(String tag) {
  if (!tag.toUpperCase().equals("TITLE")) {
    return;
  } 
  // process title
}
"title".toUpperCase();

Compliant Solution (Explicit Locale)

This compliant solution explicitly sets the locale to English to avoid unexpected results.

Code Block
bgColor#ccccff
langjava
public static void processTitle(String tag) {
  if (!tag"title".toUpperCase(Locale.ENGLISH).equals("TITLE")) {
    return;

  }
  // process title
}

Specifying Locale.ROOT is a suitable alternative under conditions where an English-specific locale would not be appropriateThis rule also applies to the String.equalsIgnoreCase() method.

Compliant Solution (Default Locale)

This compliant solution sets the default locale to English before proceeding with string operations.

Code Block
bgColor#ccccff
langjava
public static void processTitle(String tag) {
  Locale.setDefault(Locale.ENGLISH);

  if (!tag.toUpperCase().equals("TITLE")) {
    return;
  }
  // process title
}

Compliant Solution (String.equalsIgnoreCase())

This compliant solution bypasses locales entirely by performing a case-insensitive match. The String.equalsIgnoreCase() method creates temporary canonical forms of both strings. This may render them unreadable, but it performs proper comparison without making them dependent on the current locale [Schindler].

Code Block
bgColor#ccccff
langjava
public static void processTitle(String tag) {
  if (!tag.equalsIgnoreCase("TITLE")) {
    return;
  }
  // process title
}

Noncompliant Code Example (FileReader)

Java provides classes for handling input and output which can be based on either bytes or characters. The byte I/O families derive from the InputStream and OutputStream interfaces, and are independent of locale or character encoding. However, the character I/O families derive from Reader and Writer, and they must convert byte sequences into strings and back. Thus, they rely on a specified character encoding to do their conversion. This encoding is indicated by the file.encoding system property, which is part of the current locale. Consequently, a file encoded with one encoding, such as UTF-8, must not be read by a character input method using a different encoding, such as UTF-16.

Programs that read character data (whether directly using a Reader or indirectly using some method such as constructing a String from a byte array) must be aware of the source of the data. If the encoding of the data is fixed (such as if the data comes from a file resource that is shipped with the program), then that encoding must be specified by the program. Failure to specify the coding enables an attacker to change the encoding to force the program to read the data using the wrong encoding.

This does not apply to programs that read data known to be in the encoding specified by the platform running the program. For example, if the program must open a file provided by the user, it is reasonable to rely on the default encoding, expecting that it will be set correctly.

This noncompliant code example reads its own source code, and prints it out, prepending each line with a line number. If the program is run with the argument: -Dfile.encoding=UTF16, while its source file is stored as UTF8, the program will save garbage in the output file.

Code Block
bgColor#ffcccc
langjava
import java.io.*;

public class PrintMyself {
  private static String inputFile = "PrintMyself.java";
  private static String outputFile = "PrintMyself.txt";

  public static void main(String[] args) throws IOException {
    BufferedReader reader = new BufferedReader(new FileReader(inputFile));
    PrintWriter writer = new PrintWriter(new FileWriter(outputFile));
    int line = 0;
    while (reader.ready()) {
      line++;
      writer.println(line + ": " + reader.readLine());
    }
    reader.close();
    writer.close();
  }
}

Compliant Solution (Charset)

In this compliant solution, both the input and output files are explicitly encoded using UTF8. This program behaves correctly regardless of the default encoding.

Code Block
bgColor#ccccff
langjava
  public static void main(String[] args) throws IOException {
    Charset encoding = Charset.forName("UTF8");
    BufferedReader reader = new BufferedReader(new InputStreamReader( new FileInputStream(inputFile), encoding));
    PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(outputFile), encoding));

    int line = 0;

    /* rest of code unchanged */

Noncompliant Code Example (Date)

While the concepts of days and years are universal, the way in which dates are represented varies across cultures, and are therefore specific to locales. This noncompliant code example examines the current date and prints one of two messages, depending on whether or not the month is June.

Code Block
bgColor#ffcccc
langjava
public static void isJune(Date date) {
  String myString = DateFormat.getDateInstance().format( date);
  System.out.println(".toUpperCase();
The date is " + myString);
  if (myString.startsWith("Jun ")) {
    System.out.println("Enjoy June!");
  } else {
    System.out.println("It's not June.");
  }
}

This program behaves as expected on platforms with an English locale:

Code Block
The date is Jun 20, 2014
Enjoy June!

but fails on other locales. For example, the output for a German (specified by -Duser.language=de) is:

Code Block
The date is 20.06.2014
It's not June.

Compliant Solution (Explicit Locale)

This compliant solution forces the date to be printed in an English format, regardless of the current locale.

Code Block
bgColor#ccccff
langjava
String myString = DateFormat.getDateInstance(DateFormat.MEDIUM, Locale.US).format( rightNow.getTime());
/* ...rest of code unchanged...*/

Compliant Solution (Ignore Locale)

This compliant solution checks the date's MONTH attribute without formatting it. While date representations vary by culture, the contents of a Calendar date do not. Consequently, this code works in any locale.

Code Block
bgColor#ccccff
langjava
if (rightNow.get(Calendar.MONTH) == Calendar.JUNE) {
/* ...rest of code unchanged...*/

Risk Assessment

Failure to specify the appropriate locale when using locale-dependent methods on local-dependent data without specifying the appropriate locale may result in unexpected behavior.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

IDS09-J

medium

probable

medium

P8

L2

Android Implementation Details

A developer can specify locale on Android using java.util.Locale.

Bibliography

[API 2006]

Class String

[Schindler]

Schindler, Uwe. The Policeman’s Horror: Default Locales, Default Charsets, and Default Timezones, The Generics Policeman Blog

 

...

IDS08-J. Sanitize untrusted data passed to a regex            IDS10-J. Do not split characters between two data structures