The dataset "RC_Data" was created by Lori Flynn, Ebonie McNeil, and Matt Sisk.
It can be downloaded here: RC_Data_v3.zip (First publication date: April 2, 2020. Publication dates of later versions listed at bottom.)
This data was generated as part of a series of research projects led by Lori Flynn into automated classification of static analysis alerts (warnings) and meta-alerts (alerts mapped to code flaws, a.k.a. conditions).
We are publishing this data to enable others to test algorithms and tools we developed, and also to support external research on automated classification.
The RC_Data file can be downloaded and then reconstituted into a mongo database. The database contains data for two test suites: the Juliet Java Test Suite and for the Juliet C/C++ Test Suite. The Juliet test suites are open-source (created by NSA CAS, hosted by the NIST SARD website) and were created for testing the quality of static analysis flaw-finding tools. We use them in a different way than their original design. We use the test suites to help generate data for creating and testing automated classification tools. The RC_Data dataset includes structured data about the flaw-finding static analysis alerts from open-source tools, information about conditions (CWEs) those are mapped to, verdicts (true/false/unknown) determined using test suite meta-data, code metrics from open-source code metrics tools.
In the future, we will add more data to augmented versions of this dataset hosted here. They will include open-source data from more codebases (test suites and not), data from more tools, and we will add more features to the dataset.
How to use the downloaded file:
Unzip the .zip file, using your favorite tool. That leaves you with the license file license.txt
and the database file RC_Data_v2.gz
To restore to a mongo database, the following instructions work in a bash terminal in Linux:
1. Extract the compressed file: gunzip RC_Data_v2.gz
2. mongorestore --host localhost:27017 --archive=<YOUR_FILEPATH_HERE>/
RC_Data_v2
In the above command, replace <YOUR_FILEPATH_HERE>
with the filepath on your own machine.
Then, you can inspect the database (e.g., using the mongo
application from a bash terminal)
Publication dates, notes:
- RC_Data (version 1): April 2, 2020. Note: first publication.
- RC_Data_v2: April 27, 2020. Note: updated checker mappings.
- RC_Data_v3: March 11, 2021. Note: updated README, per Schiela.