Requirements:
-) Hardware: The build process of Sherlock uses 5 CPU cores. In our experiments we provided 100 GB of RAM for each docker container. Even though smaller memory size might affect running times - especially for bugs with a long time-to-bug, we expect that qualitatively the results (i.e. the speedups over the baselines) stay the same.
-) Software: Docker version >= 24.0.7
Install docker and use the Dockerfile to build Sherlock and
the analyzers CBMC, Clam, CPAchecker, ESBMC, KLEE, MOPSA, UltimateAutomizer and Symbiotic.
If you execute from the directory in which Dockerfile is located
docker build -t sherlock .
the build process will take > 1h because all tested analyzers are built.
You can however build only one analyzer by passing option --build-arg="<analyzer>_VERSION=<git_commit_id>" where
<analyzer> is one of CBMC, CLAM, ESBMC, KLEE or MOPSA and <git_commit_id> is a git commit id of the
analyzer's git repository. E.g. to build Sherlock containing CBMC at a version that Sherlock found buggy, execute
docker build --build-arg="CBMC_VERSION=95b7ac50d79384f108c1bd33bd6710b189b80e27" -t sherlock .
Once the build processes finishes, the docker image is built and the analyzers are installed inside Sherlock. Now you can start a container with
docker run -it sherlock
You can let Sherlock start a testing campaign of an analyzer by invoking
./sherlock.sh -t <analyzer>
where <analyzer> is one of cbmc, clam, cpa, esbmc, klee, mopsa, ultimate or symbiotic. Sherlock will by default pick seed files from the directory testfiles.
To test with a specific seed file run
./sherlock.sh -t <analyzer> <path>
If <path> is a C-file, Sherlock tests <analyzer> on this C-file only. If <path> is a directory, Sherlock tests <analyzer> with each C-File in <path> and its subdirectories one by one in an infinite loop.
Specify option -b to provide a budget, so a number of testing iterations. (If omitted Sherlock tests with a specific seed file until it finds a bug)
./sherlock.sh -t <analyzer> -b <budget>
In order to see reproducible results (up to timing), specify the initial integer seed for the fuzzing process of Sherlock by using option -s:
./sherlock.sh -t <analyzer> -s <seed>
To get more detailed information about Sherlock's options invoke:
./sherlock.sh -h
Finally, you can have look at Sherlock's log file out.csv that contains detailed information about the fuzzing process.
Once Sherlock finds a bug, it saves information about it in a directory in bug_files/<id>.
When minimizing a bug we use C-Reduce. Sherlock has C-Reduce installed, so there are no additional installation steps necessary.
To do the matching of assertions/assumptions during the minimization, we require the initial and transformed file to have the same number of LOCs. This is already the case for all transformations except strengthen, because here we add assertions and assumptions. In order to align the file initial.c of a strengthening to the same length as the file strengthen.c, use the script in creduce/equalizer.py. It writes a newline whenever the transformed file has a new assertion/assumption:
python3 creduce/equalizer.py bug_files/1/initial.c bug_files/1/strengthen.c
When this command finishes, the files bug_files/1/initial.c and bug_files/1/strengthen.c should have equal number of LOCs. In order to minimize now the bug, execute
cp bug_files/1/initial.c .
creduce --debug --n 1 creduce/<analyzer>_interesting.sh initial.c
where <analyzer> is the analyzer that contains the bug. Note that in case the analyzer exhibits buggy
behavior only if it is invoked with special parameters, specify these parameters by defining variable
analyzer_opts in <analyzer>_interesting.sh (line 3).
E.g. to minimize the bug of CBMC bug1 (produced using integer seed 359450, which results in the fastest run with 15 min time-to-bug), execute the following in the docker container that revealed the bug (here no special options are required for CBMC):
cp bug_files/1/initial.c .
creduce --debug --n 1 creduce/cbmc_interesting.sh initial.c
While the files in bug_files/1 have 116 LOCs, C-Reduce will print the minimized initial file with 11 LOCs.
Note that, even though C-Reduce needs about 10 min to complete the minimization, you can always find intermediate reduction of the initial and transformed
file in bug_files/0. These files are already quite small after a minute.
Sherlock has already detected many unique bugs. Here is a list of issues we filed on public bug tracking systems together with their classification for various program analyzers. For fixed bugs which we evaluated in our ASE 2024 paper you can see the commit ID of the buggy and fixed analyzer version.
| Bug ID | Analyzer name | Bug Type | Link | buggy commit ID | fixed commit ID |
|---|---|---|---|---|---|
| 1 | CBMC | Soundness | diffblue/cbmc#7953 | 95b7ac50d79384f108c1bd33bd6710b189b80e27 | 439bb2c0ccee3dc0e7fb89a6ecceb9e413cbff6b |
| 2 | CBMC | Soundness | diffblue/cbmc#7997 | ||
| 3 | CBMC | Soundness | diffblue/cbmc#8200 | ||
| 4 | CBMC | Precision | diffblue/cbmc#8223 | ||
| 5 | CBMC | Other | diffblue/cbmc#8257 | ||
| 6 | Clam | Soundness | seahorn/clam#87 | 75b5fb95fa6320f59a7b9b07241f780352ba5ab2 | 7a7880a2a5af5465a1c6c3a2a247e67e9abf2876 |
| 7 | Clam | Precision | seahorn/clam#89 | a4420033655e7f285c55803ce53ce9ca1e94b2a6 | 3fde3046e3a4d10690d1a95d3697ba591b534d69 |
| 8 | Clam | Precision | seahorn/clam#91 | 3fde3046e3a4d10690d1a95d3697ba591b534d69 | 3a64ddbc7729aa942eab25c0e2fe7071109a83b8 |
| 9 | Clam | Soundness | seahorn/clam#93 | 3a64ddbc7729aa942eab25c0e2fe7071109a83b8 | 942b86cbacac9536b733ae4284fedb49c75e98c1 |
| 10 | Clam | Soundness | seahorn/clam#94 | ||
| 11 | CPAchecker | Precision | https://gitlab.com/sosy-lab/software/cpachecker/-/issues/1140 | ||
| 12 | ESBMC | Precision | esbmc/esbmc#1528 | ||
| 13 | ESBMC | Soundness | esbmc/esbmc#1537 | 61cdf7f22ac91922482944c66fa17f155798e66c | 79c621ff20dfd9856540a314eb26e537e0b04083 |
| 14 | ESBMC | Soundness | esbmc/esbmc#1620 | ||
| 15 | ESBMC | Soundness | esbmc/esbmc#1655 | db5ae0a66e178cd666f9574c0e04d3268c93aaeb | db4c241b975597aa24b940f98856431b4a51cf47 |
| 16 | ESBMC | Soundness | esbmc/esbmc#1650 | ||
| 17 | KLEE | Soundness | klee/klee#1666 | ||
| 18 | KLEE | Soundness | klee/klee#1668 | cb5e898561f9b8769d8838bc1bdca17a6f4f5d20 | f813c88c8cb868fc9c0be78fbf92a94d72ac02b0 |
| 19 | KLEE | Crash | klee/klee#1654 | ||
| 20 | MOPSA | Soundness | https://gitlab.com/mopsa/mopsa-analyzer/-/issues/172 | 5e4d8376716bc9ceae81be1f9e00b83a9615f1ab | 4ce28ae045eede90087a73bbf650c07d380c87d2 |
| 21 | MOPSA | Precision | https://gitlab.com/mopsa/mopsa-analyzer/-/issues/170 | ||
| 22 | MOPSA | Soundness | https://gitlab.com/mopsa/mopsa-analyzer/-/issues/173 | 41f73f5c5afef2aeab94eef849461e77c2cfc01e | d4c73a5e7311180e953c69713aad91907d8adb62 |
| 23 | UltimateAutomizer | Soundness | ultimate-pa/ultimate#661 | ||
| 24 | Symbiotic | Soundness | staticafi/symbiotic#248 |
Reproducing bugs used in the evaluation section of our ASE 2024 paper "Interrogation Testing of Program Analyzers for Soundness and Precision Issues"
Follow these step to reproduce results for Table 4 and Table 5:
To refind a specific bug, first you need to build Sherlock with a specific version of the tested analyzers as the analyzers are dockerized inside Sherlock. Use the Dockerfile's options CBMC_VERSION, CLAM_VERSION, ESBMC_VERSION, KLEE_VERSION and MOPSA_VERSION. In order to reproduce the results for a specific bug, provide the commit ID of a buggy version of the analyzer which you can find in the table above. E.g. in order to reproduce the results for bug1, execute
docker build --build-arg="CBMC_VERSION=95b7ac50d79384f108c1bd33bd6710b189b80e27" -t sherlock:buggy_cbmc .
docker run -it sherlock:buggy_cbmc
root@<buggy_container_id>:/home/sherlock# ./sherlock.sh -s <seed> -t cbmc -o "<analyzer_options>" testfiles/testcomp22/nla-digbench-scaling/hard-u_valuebound100.c
Here you need to replace <seed> with an integer seed and <analyzer_options> with the analyzer options that were used to find the bug, here --unwind 1 --round-to-minus-inf --reachability-slice-fb --drop-unused-functions --max-field-sensitivity-array-size 1 --partial-loops --string-printable --arrays-uf-always.
You can find this information in the bug ID folders in sherlock/reproduce/ .
To find the full list of parameters of the sherlock command of a specific bug, please follow the instructions within each bug ID folder in the folder sherlock/reproduce/
If you want to reproduce the results for the baselines, build the Dockerfile in the directories sherlock-MT or sherlock-MT+VR,
otherwise build the Dockerfile in sherlock. To provide a specific time-to-live value, invoke Sherlock with option
-l <ttl>.
In order to confirm that the respective bug and not another bug was found, install the fixed version of the analyzer or let the Dockerfile build Sherlock with the fixed analyzer inside by providing the commit ID of a fixed version of the analyzer. E.g. in order to confirm that the first bug found by Sherlock when trying to reproduce bug1, is indeed fixed by the commit fixing bug1, execute
docker cp <buggy_conatiner_id>:/home/sherlock/bug_files .
docker build --build-arg="CBMC_VERSION=439bb2c0ccee3dc0e7fb89a6ecceb9e413cbff6b" -t sherlock:fixed_cbmc .
fixed_container_id=$(docker run -dt sherlock:fixed_cbmc)
docker cp bug_files $fixed_container_id:/home/sherlock
docker exec $fixed_container_id ./tools/metamorphic_checker.sh -o "<analyzer_options>" <analyzer> bug_files/1
echo $?
Here <buggy_conatiner_id> refers to the ID of the container created by the docker run command above,
<analyzer> is the name of the buggy analyzer (in this case cbmc) and <analyzer_options> are the options that the fuzzer generated for the analyzer, which you can find in the Sherlock command in the bug ID folders.
The directory bug_files contains the information necessary to confirm the bug: Initial and transformed file. The script tools/metamorphic_checker.sh
verifies that the bug violates the oracle and exits with exit code 0 only if the oracle is violated. Therefore, we expect that if the fixed
version of the tool is installed, a different exit code than 0 is printed.
Notice that when reproducing Clam and ESBMC bugs, the first bugs reported by Sherlock, are usually different from the fixed bugs: For Clam,
Sherlock finds cases of precision loss (e.g. due to widening) and for ESBMC Sherlock warns about other reported, but not fixed bugs).
To this end we need to invoke Sherlock with option -c. This option makes Sherlock continue testing when it finds a bug.
Sherlock will then produce different bug reports in the directory bug_files.
In order to verify that Sherlock found the 'right' bug after some time, execute the above commands with several bug reports, i.e.
bug_files/1, bug_files/2, etc.