A log injection vulnerability arises when a log entry contains unsanitized user input. A malicious user can insert fake log data and consequently deceive system administrators as to the system's behavior [OWASP 2008]. For example, an attacker might split a legitimate log entry into two log entries by entering a carriage return and line feed (CRLF) sequence to mislead an auditor. Log injection attacks can be prevented by sanitizing and validating any untrusted input sent to a log.

Logging unsanitized user input can also result in leaking sensitive data across a trust boundary. For example, an attacker might inject a script into a log file such that when the file is viewed using a web browser, the browser could provide the attacker with a copy of the administrator's cookie so that the attacker might gain access as the administrator.

Noncompliant Code Example

This noncompliant code example logs untrusted data from an unauthenticated user without data sanitization.

if (loginSuccessful) {
  logger.severe("User login succeeded for: " + username);
} else {
  logger.severe("User login failed for: " + username);
}

Without sanitization, a log injection attack is possible. A standard log message when username is guest might look like this:

May 15, 2011 2:19:10 PM java.util.logging.LogManager$RootLogger log
SEVERE: User login failed for: guest 

If the username that is used in a log message is not guest but rather a multiline string like this:

guest 
May 15, 2011 2:25:52 PM java.util.logging.LogManager$RootLogger log
SEVERE: User login succeeded for: administrator

the log would contain the following misleading data:

May 15, 2011 2:19:10 PM java.util.logging.LogManager$RootLogger log
SEVERE: User login failed for: guest 
May 15, 2011 2:25:52 PM java.util.logging.LogManager log
SEVERE: User login succeeded for: administrator

Compliant Solution (Sanitized User)

This compliant solution sanitizes the username before logging it, preventing injection attacks.

if (loginSuccessful) {
  logger.severe("User login succeeded for: " + sanitizeUser(username));
} else {
  logger.severe("User login failed for: " + sanitizeUser(username));
}

The sanitization is done by a dedicated method for sanitizing user names:

public String sanitizeUser(String username) {
  return Pattern.matches("[A-Za-z0-9_]+", username)) 
      ? username : "unauthorized user";
}

Compliant Solution (Sanitized Logger)

This compliant solution uses a text logger that automatically sanitizes its input. A sanitized logger saves the developer from having to worry about unsanitized log messages.


Logger sanLogger = new SanitizedTextLogger(logger);

if (loginSuccessful) {
  sanLogger.severe("User login succeeded for: " + username);
} else {
  sanLogger.severe("User login failed for: " + username);
}

The sanitized text logger takes as delegate an actual logger. We assume the logger outputs text log messages to a file, network, or the console, and each log message has no indented lines. The sanitized text logger sanitizes all text to be logged by indenting every line except the first by two spaces. While a malicious user can indent text by more, a malicious user cannot create a fake log entry because all of her output will be indented, except for the real log output.

class SanitizedTextLogger extends Logger {
  Logger delegate;

  public SanitizedTextLogger(Logger delegate) {
    super(delegate.getName(), delegate.getResourceBundleName());
    this.delegate = delegate;
  }

  public String sanitize(String msg) {
    Pattern newline = Pattern.compile("\n");
    Matcher matcher = newline.matcher(msg);
    return matcher.replaceAll("\n  ");
  }

  public void severe(String msg) {
    delegate.severe(sanitize(msg));
  }

  // .. Other Logger methods which must also sanitize their log messages
}

Risk Assessment

Allowing unvalidated user input to be logged can result in forging of log entries, leaking secure information, or storing sensitive data in a manner that violates a local law or regulation.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

IDS03-J

Medium

Probable

Medium

P8

L2

Automated Detection

ToolVersionCheckerDescription
The Checker Framework

2.1.3

Tainting CheckerTrust and security errors (see Chapter 8)
CodeSonar
8.1p0

JAVA.IO.TAINT.LOG

Tainted Log (Java)

Fortify
Log_ForgingImplemented
Klocwork

2024.4

SVLOG_FORGINGImplemented
Parasoft Jtest
2024.1
CERT.IDS03.TDLOGProtect against log forging

Related Guidelines

ISO/IEC TR 24772:2013

Injection [RST]

MITRE CWE

CWE-144, Improper neutralization of line delimiters
CWE-150, Improper neutralization of escape, meta, or control sequences
CWE-117, Improper Output Neutralization for Logs 

MITRE CAPEC

CAPEC-93, Log Injection-Tampering-Forging

Bibliography



13 Comments

  1. I don't see any rule about having a consistent sanitization policy.  There are two main ways to ensure sanitization:

    (1) Sanitize at the point of receiving untrusted input.

    (2) Sanitize at the point of using untrusted input.

    Some projects will specify to use both.

    (1) is usually a lot more performant as usually you only capture input once but use it multiple times.

    e.g. registering a new user will capture the username once, but it will be stored in a DB, queried for on login, displayed in html, logged on login failure etc. Having to sanitize the username when storing, querying, generating html, logging etc. is a high development burden and potential performance issue that someone would need to assess those costs against any additional security benefit.

    What is the security benefit of doing (2) everywhere instead of (1) as most of these sanitisation rules seem to specify?

    1. Well, I would argue that IDS00-J. Sanitize untrusted data passed across a trust boundaryindicates that both should be used. That is, sanitize when you receive untrusted input (eg data crosses a trust boundary), and sanitize again when you send data to an untrusted output sink (eg data croses a trust boundary).

      I disagree that these rules prefer output sanitization vs input sanitization. Clearly sanitizing input is faster than sanitizing output. But sometimes it is more difficult to do correctly, esp if you don't know where the input is going. For instance, receiving some text from the user requires different sanitization if the text is going to an SQL database, a web browser, or a log file. In that case, you may choose to sanitize the text only when it gets output to whichever sink is chosen.

  2. This sentence is really a mess:

    For example, a user might split a legitimate log entry into two log entries by entering a carriage return and line feed (CRLF) sequence, resulting in two log entries either (or both) of which might be misleading.

    It is redundant, and also sort of ends badly.  Fred, you touched this last... can you repair it?

  3. In my opinion the proposed solution here has two flaws:

    • it violates the separation of concerns principle
    • it introduces leaky abstraction

    The point of having logging API's is to abstract you from the complexities of property sanitizing, serializing and storing of log entries. Because of this your logging framework should be configured to handle and encode any data passed to the loggers in a secure manner. Escaping characters etc. is dependent on the log storage format you configure under the logging API's. File encoders / appenders should escape new line charachtes, JSON encoders / appenders on the other hand should escape " character and so on.

    Besides the above, not only your code uses those logging APIs but also third party libraries. JSON demarshalleres, HTTP servlet containers, can log used data without your knowledge and before you even get a chance to validate it.

    There for the code to sanitize the logging data should not be messed up with the business logic, but in the code under the logging API which knows the destination log format and knows how to encode and sanitize them securely.

    This is the so called Output encoding strategy mentioned in: https://cwe.mitre.org/data/definitions/117.html

     

    1. I added CWE-117 to the list of associated CWEs, thanks.

      As to the problems you cite with the CS, I do feel your pain. We do recommend that systems that accept string output provide mechanisms for allowing callers to sanitize their output. In this case, that means the logger's class should provide a sanitize() method to prevent log injections.

      Unfortunately in this world, many such systems fail to provide any sanitization. Java's Logger package provides none. Furthermore, while it is trivial to sanitize data going to an XML-based log file, or JSON-based log file, how would you sanitize a plain text format such as what java.util.Logger uses? Eliminating newlines renders the file almost unreadable. I've seen plenty of log messages with multiple lines; usually they contain a Java exception. I suspect there are makeshift solutions you could implement (such as enforcing indentation on all user-supplied newlines), but I have not seen any standard solutions (outside of XML or some similar format).

      1. Well because of their limitation plain text file loges files are becoming a thing of the past in larger more mature systems. Especially in web systems which scale horizontally its not practical to have text files on each server. If used at all text log files are only a last resort mechanism if a single server instance could not connect to logstash or any other centralized logs repository.

        I agree with you that you should not trust the default configuration of your logging framework. But you can cleanly enable stanatization in the loggers configuration via custom encoders, formaters, addapters, etc. You don't have to mix it up in business code like in the example given here.

        Relaying only on input validation in the applications code still leaves a potential security hole - the logging done by third party libraries which process and log data before it even gets to the application code. This is why I think that relaying solely the output encoding strategy when it comes to logging is more secure and provides cleaner code.

      2. Could we collaborate on a output encoding based example as another compliant solution?

        1. Hello, Adam.

          I'm happy to work with you to help fix this rule. (I'm not sure if a 2nd compliant solution is the answer, will think about it over the weekend.)

          The more I think about this the more I'm convinced that if a logger accepts data that can create a fake log message, the logger is faulty by design. Ideally your code should do nothing differently than today, but the logger itself automatically sanitizes log data.

          However, that would prevent an attacker from creating a fake log message. It would not prevent other bad usernames such as "your mom". Sanitizing something that should be a valid username should definitely not be done in the Logger, it should definitely be done at least in the same class as the compliant soluiton. In that context, the current CS has sound business logic.

          That is independent of the general sanitization to prevent fake log messages.

          1. I have added a new compliant solution that implements my suggestion of indenting subsequent log lines to distinguish them from 'real' new log entries.

  4. Is the replaceAll in the second compliant example correct? I believe replaceAll takes two parameters, and it appears the second example passes only one.

      1. Perfect, thanks for the clarification. (I had initially read it as String#replaceAll)