The dataset "RC_Data" was created by Lori Flynn, Aubrie Woods, Ebonie McNeil, and Matt Sisk.
It can be downloaded here: RC_Data_v3.zip (First publication date: April 2, 2020. Publication dates of later versions listed at bottom.)
This data was generated as part of a series of research projects led by Lori Flynn into automated classification of static analysis alerts (warnings) and meta-alerts (alerts mapped to code flaws, a.k.a. conditions).
...
The RC_Data file can be downloaded and then reconstituted into a mongo database. The database contains data for two test suites: the Juliet Java Test Suite and for the Juliet C/C++ Test Suite. The Juliet test suites are open-source (created by NSA CAS, hosted by the NIST SARD website) and were created for testing the quality of static analysis flaw-finding tools. We use them in a different way than their original design. We use the test suites to help generate data for creating and testing automated classification tools. The RC-_Data dataset includes structured data about the flaw-finding static analysis alerts from open-source tools, information about conditions (CWEs) those are mapped to, verdicts (true/false/unknown) determined using test suite meta-data, code metrics from open-source code metrics tools.
...
How to use the downloaded file (for example below, version number is "v3"):
Unzip the .zip file, using your favorite tool. That leaves you with the license file license.txt
and the database file opheliaRC_merge_oss_c_java.gz Data_v3.gz
To restore to a mongo database, the following instructions work in a bash terminal in Linux:
1. Extract the compressed file: gunzip
opheliagunzip RC_
merge_oss_c_javaData_v3.gz
2. mongorestore --host localhost:27017 --archive=<YOUR_FILEPATH_HERE>/ophelia
RC_
merge_oss_c_javaData_v3
In the above command, replace <YOUR_FILEPATH_HERE>
with the filepath on your own machine.
Then, you can inspect your the database (e.g., using the mongo
application from a bash terminal)
Publication dates, notes:
- RC_Data (version 1): April 2, 2020. Note: first publication.
- RC_Data_v2: April 27, 2020. Note: updated checker mappings.
- RC_Data_v3: March 11, 2021. Note: updated README, per Schiela.