Sample |
---|
|
Feature | Description |
---|---|
id |
Sample ID |
label |
1 – the defects reported by Infer are true positives according to the differential analysis. 0 – the reported defects are false positives |
label_source |
Entity that produced and labeled the sample: “auto_labeler” – Samples are generated and labeled based on the differential analysis using static analyzer. Please refer Sec III.C in the D2A paper for details “after_fix_extractor” – Given a defect whose label is 1 in the before-fix version, extract the corresponding snippets in the after-fix version. Please refer Sec. III.D in the D2 paper for details |
bug_type |
Bug type identified by static analyzer |
project |
Open-source project mined |
bug_info |
Bug details provided by static analyzers like Infer. This field is null for “after_fix_extractor” samples because such samples are not based on static analyzer reports. |
bug_info->qualifier |
Further violation description depending on the bug_type |
bug_info->file |
File in project where bug was found |
bug_info->procedure |
Containing function |
bug_info->line |
Line number in file |
bug_info->column |
Column number in line of the bug |
bug_info->url |
The url to the bug location on Github |
adjusted_bug_loc |
When not null, it points to the actual location of the bug described in the “bug_info”. We have this field because infer may point to a location that does not perfectly match the bug description (the file, line, and column specified in the “bug_info” section) |
adjusted_bug_loc->File |
File in project where bug was found |
adjusted_bug_loc->line |
Line number in file of the bug |
adjusted_bug_loc->column |
Column number in line of the bug |
adjusted_bug_loc->url |
The url to the bug location on github |
bug_loc_trace_index |
When “adjusted_bug_loc” is not null, this is the index of the corresponding step in the trace (the list in the “trace” field). When “adjust_bug_loc” is null, this field is null too |
versions |
Related project git versions |
versions->before |
git version hash before the commit with the vulnerability if label = 1 |
versions->after |
git version hash after the commit, where the vulnerability is fixed if label = 1 |
sample_type |
Which git version the functions (in “functions” field) are extracted from: “before-fix” – the bug, trace and related functions are extracted from the before-commit version “after-fix” – the bug info, and related functions are extract from the after-commit version |
trace |
Array of steps that describe the path to the candidate sample |
trace->idx |
Entry in the array |
trace->level |
The depth of the calling stack |
trace->Description |
Text description of the step |
trace->func_removed |
Whether the containing function is removed in the after-commit version. This field is null for “before-fix” typed samples |
trace->file_removed |
Whether the containing file is removed in the after-commit version. This field is null for the “before-fix“ samples |
trace->file |
Fully qualified file name in the project structure |
trace->loc |
Relevant line number:column number in file |
trace->func_name |
Function name |
trace->func_key |
Indexing key for function (contains range in code function can be found). The function body can be found in the dictionary specified in the “functions” field using this key. |
trace->is_func_definition |
Whether the specified location is inside a function declaration (e.g. not in declarations without any function body) |
trace->url |
GitHub URL that highlights the range of the containing function |
functions |
Dictionary of functions identified in trace |
functions-> |
Function key identified in a trace step.fun_key in the list entry |
functions->File |
Fully qualified file name in the project structure |
functions->loc |
Range of function in file |
functions->name |
Function name |
functions->touched_by_commit |
Boolean: True if file changed in the after commit |
functions->code |
Complete function code |
commit |
Commit associated with this sample |
commit->url |
GitHub URL associated with this commit |
commit->changes |
Array of changes |
commit->changes->before |
File name before change |
commit->changes->after |
File name after change |
commit->changes->changes |
A list of line ranges got changed by the commit. Each range item is in the format of “L_1,T_1^^L_2,T_2”: “L_1,T_1” refers to the range in the file before change “L_2,T_2” is the range in the file after the change L_i is the starting line and the range has a total of T_i lines. |
compiler_args |
List of compiler flags to build this commit per file |
compiler_args-> |
Key value pairs associated to each filename |
compiler_args-> |
Key value pairs associated to each filename |
zipped_bug_report |
b64encoded and gzipped Infer output of the reported issue. It’s null for “after_fix” samples because they were not from infer static analysis results. |