IDS04-J. Safely extract files from ZipInputStream

Be careful when extracting entries from java.util.zip.ZipInputStream. Two particular issues to avoid are entry filenames that canonicalize to a path outside of the target directory of the extraction, and entries that cause consumption of excessive system resources. In the former case, an attacker can write arbitrary data from the zip file into any directories accessible to the user. In the latter case, denial of service can occur when resource usage is disproportionately large in comparison to the input data that causes the resource usage. The nature of the zip algorithm permits the existence of zip bombs where a small file, such as ZIPs, GIFs, and gzip-encoded HTTP content consumes excessive resources when uncompressed because of extreme compression.

The zip algorithm is capable of producing very large compression ratios [Mahmoud 2002]. For example, a file consisting of alternating lines of a characters and b characters can achieve a compression ratio of more than 200 to 1. Even higher compression ratios can be easily obtained using input data that is targeted to the compression algorithm, or using more input data (that is untargeted), or other compression methods.

Any entry targeting a file not within the directory intended by the client program (after filename canonicalization, as per IDS02-J. Canonicalize path names before validating them), must not be extracted or must be extracted to a safe location. Any entry in a zip file whose uncompressed file size is beyond a certain limit must not be uncompressed. The actual limit is dependent on the capabilities of the platform.

Noncompliant Code Example

This noncompliant code fails to validate the name of the file that is being unzipped. It passes the name directly to the constructor of FileOutputStream. It also fails to check the resource consumption of the file that is being unzipped. It permits the operation to run to completion or until local resources are exhausted.

static final int BUFFER = 512;
// ...

public final void unzip(String filename) throws java.io.IOException{
  FileInputStream fis = new FileInputStream(filename);
  ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
  ZipEntry entry;
  while ((entry = zis.getNextEntry()) != null) {
    System.out.println("Extracting: " + entry);
    int count;
    byte data[] = new byte[BUFFER];
    // write the files to the disk
    FileOutputStream fos = new FileOutputStream(entry.getName());
    BufferedOutputStream dest = new BufferedOutputStream(fos, BUFFER);
    while ((count = zis.read(data, 0, BUFFER)) != -1) {
      dest.write(data, 0, count);
    }
    dest.flush();
    dest.close();
    zis.closeEntry();
  }
  zis.close();
}

Compliant Solution

In this compliant solution, the code validates the name of each entry before extracting the entry. If the name is invalid, the entire extraction is aborted. However, a compliant solution could also choose to skip just that entry and continue the extraction process, or even to extract the entry to some safe location.

Furthermore, the code inside the while loop tracks the uncompressed file size of each entry in a zip archive while extracting the entry. It throws an exception if the entry being extracted is too large ��€š�š�š�š�š�š�š��‚�š�š�š�š�š�š�? about 100MB in this case. We do not use the ZipEntry.getSize() method because the value it reports is not reliable.

static final int BUFFER = 512;
static final int TOOBIG = 0x6400000; // 100MB
// ...

private String validateFilename(String filename, String intendedDir) {
  File f = new File(filename);
  String canonicalPath = f.getCanonicalPath(); 

  File iD = new File(intendedDir);
  String canonicalID = iD.getCanonicalPath();
  
  if (canonicalPath.startsWith(canonicalID)) {
    return canonicalPath;
  } else {
    throw new IllegalStateException("File is outside extraction target directory.");
  }
}

public final void unzip(String filename) throws java.io.IOException{
  FileInputStream fis = new FileInputStream(filename);
  ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
  ZipEntry entry;
  try {
    while ((entry = zis.getNextEntry()) != null) {
      System.out.println("Extracting: " + entry);
      int count;
      byte data[] = new byte[BUFFER];  // write the files to the disk, but ensure that the file is not insanely big
      int total = 0;
      String name = validateFilename(entry.getName(), ".");
      FileOutputStream fos = new FileOutputStream(name);
    ���������������‚�š�š�š�š�š�š�š  BufferedOutputStream dest = new BufferedOutputStream(fos, BUFFER);
    ���������������‚�š�š�š�š�š�š�š  while (total <= TOOBIG && (count = zis.read(data, 0, BUFFER)) != -1) {
    ���������������‚�š�š�š�š�š�š�š ���������������‚�š�š�š�š�š�š�š ���������������‚�š�š�š�š�š�š�š  dest.write(data, 0, count);
    ���������������‚�š�š�š�š�š�š�š ���������������‚�š�š�š�š�š�š�š ���������������‚�š�š�š�š�š�š�š  total += count;
    ���������������‚�š�š�š�š�š�š�š  }
���������������‚�š�š�š�š�š�š�š      dest.flush();
���������������‚�š�š�š�š�š�š�š      dest.close();
      zis.closeEntry();

      if (total > TOOBIG) {
        throw new IllegalStateException("File being unzipped is huge.");  }
    }
  } finally {
    zis.close();
  }
}

Risk Assessment

Rule	Severity	Likelihood	Remediation Cost	Priority	Level
IDS04-J	low	probable	high	P2	L3

Related Guidelines

MITRE CWE	CWE-409. Improper handling of highly compressed data (data amplification)
Secure Coding Guidelines for the Java Programming Language, Version 3.0	Guideline 2-5. Check that inputs do not cause excessive resource consumption

Bibliography

[Mahmoud 2002]

Compressing and Decompressing Data Using Java APIs

IDS05-J. Use a subset of ASCII for file and path names

Space shortcuts

Page tree

Noncompliant Code Example

Compliant Solution

Risk Assessment

Related Guidelines

Bibliography