Using locale-dependent methods on locale-dependent data can produce unexpected results when the locale is unspecified. Programming language identifiers, protocol keys, and HTML tags are often specified in a particular locale, usually Locale.ENGLISH
. Running a program in a different locale may result in unexpected program behavior or even allow an attacker to bypass input filters. For these reasonreasons, any program that inspects data generated by a locale-dependent function must specify the locale used to generate that data.
...
However, most languages which use the Latin alphabet associate the letter I
as the uppercase version of i
. But Turkish is an exception: it has a dotted i
whose uppercase version is also dotted : (İ
, ) and an undotted ı
whose uppercase version is undotted (I
). Changing capitalization on most strings in the Turkish locale locale [API 2006] may produce unexpected results:
...
Many programs only use locale-dependent methods for outputting information, such as dates. Provided , provided that the locale-dependent data is not inspected by the program, and it may safely rely on the default locale.
...
Many web apps, such as forum or blogging software, input HTML and then display it. Displaying untrusted HTML can subject a web app to XSS ( cross-site scripting (XSS) or HTML injection vulnerabilities. Therefore, it is vital that HTML be sanitized before sending it to a web browser.
...
In HTML, tags are case-insensitive, and can therefore be specified using uppercase, lowercase, or any mixture of cases. This noncompliant code example uses the locale-dependent String.toUpperCase()
method to convert an HTML tag to upper case, to check it for further processing. The code must ignore <SCRIPT>
tags, as they indicate code that is to be discarded. While Whereas the English locale would convert "script"
to "SCRIPT"
, the Turkish locale will convert "script"
to "SCRİPT"
, and the check will fail to detect the <SCRIPT>
tag.
Code Block | ||||
---|---|---|---|---|
| ||||
public static void processTag(String tag) { if (tag.toUpperCase().equals("SCRIPT")) { return; } // processProcess tag } |
Compliant Solution (Explicit Locale)
This compliant solution explicitly sets the locale to English to avoid unexpected results.:
Code Block | ||||
---|---|---|---|---|
| ||||
public static void processTag(String tag) { if (tag.toUpperCase(Locale.ENGLISH).equals("SCRIPT")) { return; } // processProcess tag } |
Specifying Locale.ROOT
is a suitable alternative under conditions where an English-specific locale would not be appropriate.
...
This compliant solution sets the default locale to English before performing string comparisons.:
Code Block | ||||
---|---|---|---|---|
| ||||
public static void processTag(String tag) { Locale.setDefault(Locale.ENGLISH); if (tag.toUpperCase().equals("SCRIPT")) { return; } // processProcess tag } |
Compliant Solution (String.equalsIgnoreCase()
)
This compliant solution bypasses locales entirely by performing a case-insensitive match. The String.equalsIgnoreCase()
method creates temporary canonical forms of both strings. This , which may render them unreadable, but it performs proper comparison without making them dependent on the current locale [Schindler 12].
Code Block | ||||
---|---|---|---|---|
| ||||
public static void processTag(String tag) { if (tag.equalsIgnoreCase("SCRIPT")) { return; } // processProcess tag } |
Noncompliant Code Example (FileReader
)
...
This noncompliant code example reads its own source code, and prints it out, prepending each line with a line number. If the program is run with the argument : -Dfile.encoding=UTF16
, while its source file is stored as UTF8
, the program will save garbage in the output file.
...
Code Block | ||||
---|---|---|---|---|
| ||||
public static void main(String[] args) throws IOException { Charset encoding = Charset.forName("UTF8"); BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(inputFile), encoding)); PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(outputFile), encoding)); int line = 0; /* restRest of code unchanged */ |
Noncompliant Code Example (Date
)
While the The concepts of days and years are universal, but the way in which dates are represented varies across cultures , and are therefore specific to locales. This noncompliant code example examines the current date and prints one of two messages, depending on whether or not the month is June.
...
but fails on other locales. For example, the output for a German locale (specified by -Duser.language=de
) is:
...
This compliant solution forces the date to be printed in an English format, regardless of the current locale.:
Code Block | ||||
---|---|---|---|---|
| ||||
String myString = DateFormat.getDateInstance(DateFormat.MEDIUM, Locale.US).format(rightNow.getTime()); /* ...rest of code unchanged...*/ |
...
This compliant solution checks the date's MONTH
attribute without formatting it. While Although date representations vary by culture, the contents of a Calendar
date do not. Consequently, this code works in any locale.
...
Rule | Severity | Likelihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
STR02-J | mediumMedium | probableProbable | mediumMedium | P8 | L2 |
Android Implementation Details
...
Bibliography
[API 2006] | Class |
[Seacord 2015] | STR02-J. Specify an appropriate locale when comparing locale-dependent data LiveLesson |
[Schindler 12] | Schindler, Uwe. The Policeman’s Horror: Default Locales, Default Charsets, and Default Timezones, The Generics Policeman Blog |
...