diff --git a/README.md b/README.md index b182a91f..a02f4868 100644 --- a/README.md +++ b/README.md @@ -33,6 +33,8 @@ about suspicious code: let-binding of the form `let x = ... in x` (the binding is useless); optional argument in an argument's type: `val f: ... -> (... -> ?_:_ -> ...) -> ...` +For more information, see [the documentation](docs/USER_DOC.md) + ## Requirements diff --git a/docs/USAGE.md b/docs/USAGE.md new file mode 100644 index 00000000..3d33a863 --- /dev/null +++ b/docs/USAGE.md @@ -0,0 +1,384 @@ +# Table of contents + ++ [Usage](#usage) ++ [Main reporting options](#main-reporting-options) + + [Global options](#global-options) + + [Default reports](#default-reports) + + [Optional arguments](#optional-arguments) + + [Stylistic issues](#stylistic-issues) ++ [Advanced usage](#advanced-usage) + + [Path-related options](#path-related-options) + + [`--exclude `](#--exclude-path) + + [`--references `](#--references-path) + + [Value-related options](#value-related-options) + + [`--underscore`](#--underscore) + + [`--internal`](#--internal) + + [Missing or invalid reports](#missing-or-invalid-reports) + + [`--verbose`](#--verbose) + + [Thresholds](#thresholds) + + [Thresholds on default reports](#thresholds-on-default-reports) + + [Thresholds on optional arguments](#thresholds-on-optional-arguments) + +# Usage + +Calling `dead_code_analyzer --help` provides the following output, describing +the main command line aspect, different options available and their effects. + +The `` argument is any number of directory, `.cmt` and `.cmi` files. +These files can be produced using the compiler flags `-keep-locs` (on by default) +for `.cmi` and `-bin-annot` for `.cmt`. +The directories are traversed looking for such files. +> [!TIP] +> If you are using `dune` for your project, the files can be obtained via the +> [`@check` alias](https://dune.readthedocs.io/en/stable/reference/aliases/check.html) + +> [!IMPORTANT] +> The order of the command line arguments matter.\ +> Using `dead_code_analyzer ` ignores the `` when +> analyzing ``.\ +> Similarly, some options may override the effects of +> others. E.g. `dead_code_analyzer --nothing --all ` is the same as +> `dead_code_analyzer --all `. + +``` +Usage: dead_code_analyzer +Options are: + --exclude Exclude given path from research. + --references Consider given path to collect references. + --underscore Show names starting with an underscore + --verbose Verbose mode (ie., show scanned files) + -v See --verbose + --internal Keep internal uses as exported values uses when the interface is given. This is the default behaviour when only the implementation is found + --nothing Disable all warnings + -a See --nothing + --all Enable all warnings + -A See --all + -E Enable/Disable unused exported values warnings. + can be: + all + nothing + "threshold:": report elements used up to the given integer + "calls:": like threshold + show call sites + -M Enable/Disable unused methods warnings. + See option -E for the syntax of + -Oa Enable/Disable optional arguments always used warnings. + can be: + all + nothing + + "calls:" like + show call sites + can be: + "both:,": both the number max of exceptions (given through the integer) and the percent of valid cases (given as a float) mu +st be respected for the element to be reported + "percent:": percent of valid cases to be reported + -On Enable/Disable optional arguments never used warnings. + See option -Oa for the syntax of + -S Enable/Disable coding style warnings. + Delimiters '+' and '-' determine if the following option is to enable or disable. + Options (can be used together): + bind: useless binding + opt: optional arg in arg + seq: use sequence + unit: unit pattern + all: bind & opt & seq & unit + -T Enable/Disable unused constructors/records fields warnings. + See option -E for the syntax of + -help Display this list of options + --help Display this list of options +``` + +# Main reporting options + +## Global options + +**options :** `--nothing`, `--all` + +To ease the usage of the tool to only focus on some kinds of reports, 2 options +are available : +- `--nothing` deactivates all the reports. One can then selectively activate the + ones they are interested in using the corresponding options (see the sections below). + This is equivalent to `-E nothing -M nothing -T nothing -On nothing -Oa nothing -S -all`. +- `--all` activates all the reports. One can then selectively deactivate the + ones they are not interested in using the corresponding options (see the sections below). + This is equivalent to `-E all -M all -T all -On all -Oa all -S +all`. + +Activating or deactivating a report section actually activates or deactivates +some parts of the analysis. Therefore, it impacts the cost (memory and time) +of the analysis. + +The `dead_code_analyzer`'s command line arguments are processed and applied +in the order they appear. +Consequently, those 2 options are generally expected to appear before all the +other report-related options. Using them after would simply override the effects +of the others. + + +**Examples :** +1. `dead_code_analyzer --nothing ` deactivates all the reports + and then apply the `` before analyzing ``. +1. `dead_code_analyzer --nothing ` apply the `` and + then decativates all the reports before analyzing ``, which is not very + effective. + +## Default reports + +**options :** `-E `, `-M `, `-T ` + +By default the `dead_code_analyzer` only looks for unused exported values, +unused methods, and unused constructors and record fields. They can respectively +be deactivated using `-E nothing`, `-M nothing`, or `-T nothing`. + +**Example :** `dead_code_analyzer -M nothing -T nothing` will only analyze and +report unused exported values. + +Similarly to `--nothing` and `--all`, one can selectivey activate the reports +they are interested in by using respectively `-E all`, `-M all`, or `-T all`. + +**Example :** `dead_code_analyzer --nothing -E all` will analyze and report +unused exported values. + +> [!Note] +> `dead_code_analyzer ` is equivalent to +> `dead_code_analyzer -E all -M all -T all `. + +> [!Warning] +> Analyzing unused methods can be memory intensive. + +## Optional arguments + +**options :** `-On `, `-Oa ` + +As discussed in [the introduction](USER_DOC.md#introduction), another important +focus of the `dead_code_analyzer` is on the optional arguments never or always used. +Analyzing and reporting on those can be turned on using respectively `-On all`, or +`-Oa all`. Similarly with the previous options, turning these reports off is +done using the `nothing` argument instead of `all`: respectively `-On nothing`, +or `-Oa nothing`. Using `--nothing` or `--all` also affects these sections. + +**Example :** `dead_code_analyzer --nothing -On all` will enable the analysis of +optional arguments and only report those that are never used. + +> [!Note] +> Reporting for either the optional arguments never or always used actually +> relies on the same analysis which can be memory intensive. + +## Stylistic issues + +**option :** `-S ` + +Finally, as a bonus, one can activate some stylistic reports by using `-S +all`. +Notice the extra `+` before `all`. The `-S` option expects slightly different +arguments from the previous options. Rather than expecting `all` or `nothing`, +it expects a description of the desired stylistic issues to report, with `+` +indicating a category to activate and `-` one to deactivate. +Using `--nothing` or `--all` also affects this section. + +**Example :** `dead_code_analyzer --nothing -S +all-bind` only reports stylistic issues, +and reports all of them but the "useless binding" ones + +> [!Note] +> `-S +all` is equivalent to `-S +bind+opt+seq+unit`, and `-S -all` to +> `-S -bind-opt-seq-unit` + +# Advanced usage + +## Path-related options + +Imagine a file tree like the one below, with `foo.*` and `lib/*` the actual code +of the product, and `debug/*` code intended for debugging purpose only during +the development. +``` +src +├── debug +│   ├── debug.cmi +│   ├── debug.cmt +│   └── debug.ml +├── foo.cmi +├── foo.cmt +├── foo.ml +├── foo.mli +└── lib + ├── lib.cmi + ├── lib.cmt + ├── lib.ml + └── lib.mli +``` + +### `--exclude ` + +`dead_code_analyzer src` will analyze and find unused elements of code in all +of `src`. Some reports may point to locations in `src/debug/debug.ml`, adding +noise because this module is actually used occasionnaly but no reference to it +should subsist in production code. + +To ignore that module, both for declarations and uses, one can use the +`--exclude ` option. + +`dead_code_analyzer --exclude src/debug src` only reports locations outside of +`src/debug`. + +### `--references ` + +Using the previous command line, some reported elements (e.g. located in +`src/lib/lib.mli`) could be used by the `Debug` module. In this situation. +In this case, completely excluding `src/debug` from the analysis leads to false +positives (FP): invalid reports. + +To fix the situation, one can use the `--references ` option. This option +includes the `` when observing uses. + +As a result, `dead_code_analyzer --exclude src/debug --references src/debug src` +only reports elements outside of `src/debug` and does not report elements used by +code in `src/debug`. + +## Value-related options + +### `--underscore` + +The compiler ignores unused values when their names are prefixed with an +underscore (e.g. `let _x = ...`). The `dead_code_analyzer` imitates that +behavior. One can enable the analysis and reports on such names using the +`--underscore` option. + +> [!Note] +> This behavior (ignoring names starting with underscore) is not only limited +> to unused exported values but applied to all the report kinds. Thus, they are +> all affected by the `--underscore` option. + +### `--internal` + +The compiler already warns the user about unused values that are not exported. +That is, values that are not exposed in the signature of a module. In case no +interface is available, all the toplevel declaration are exported. Consequently, +none of them can be reported by the compiler. In order to complement the +compiler well, and fit some coding habits, the `dead_code_analyzer` keeps track +of internal uses in implementations (`.ml`) without corresponding interfaces (`.mli`). + +Using the `--internal` option activates the analysis of internal use for +implementations that have a corresponding interface file, similarly to +implementations without interface as described above. This can be useful to +reduce the volume of reports. It also expands the notion of unused exported +values from unused externally (can be removed from the `.mli`) to not used at +all (can be removed from the codebase). + +**Examples :** +1. `f` is exported and used internally. Neither the compiler nor the + `dead_code_analyzer` will report it. + ```ocaml + (* foo.ml *) + (* foo.mli does not exist *) + let f x = x + let () = f () + ``` +2. `f` is exported and not used internally. Assuming it is not used externally + either, the `dead_code_analyzer` will report it, while the compiler will not. + ```ocaml + (* foo.ml *) + (* foo.mli does not exist *) + let f x = x + ``` +3. `f` is exported via `foo.mli` and used internally. Assuming it is not used + externally, the `dead_code_analyzer` will report it by default. + ```ocaml + (* foo.ml *) + let f x = x + let () = f () + ``` + ```ocaml + (* foo.mli *) + val f : 'a -> 'a + ``` + By using the `--internal` option, the `dead_code_analyzer` will not report `f` + as unused anymore. + +## Missing or invalid reports + +### `--verbose` + +When using the `dead_code_analyzer`, one may encounter false negatives (FN): +missing reports. Users may also encounter false positives (FP): invlaid reports. +In both cases, chances are that some files were not analyzed. +An easy way to see the list of analyzed files is by using the `--verbose` option. +This will print out `Scanning ` for every file it tries to read. +In case there is an issue reading the file, then it prints out a second line +indicating the `` and the issue. In this case, the analyzer ignores +the file and moves on. + +- If a file is ignored and it is not obvious why, then opening an issue is +welcome. +- If no file is ignored, check that no file is missing. There should be a `.cmi` + and a `.cmt` file for each expected ``. +- If no `.cmi` or `.cmt` file is missing then the false negatives must be due + to limitations of the tool and opening an issue is welcome. + +## Thresholds + +### Thresholds on default reports + +**options :** `-E `, `-M `, `-T ` + +In addition to activating/deactivating some reports, one can extend the details +of the reports using thresholds. The idea with thresholds is to not only report +unused elements of code but also report almost unused ones, up to a desired +maximum amount: the threshold. + +For the 3 default reports (exported values, methods, and constructors and record +fields), the threshold argument's format is `threshold:`, with `` a +positive integer. Using this argument will not only enable the corresponding +report but also some subreports for values used exactly `n` times with +`0 < n <= `. + +**Example :** `dead_code_analyzer --nothing -E threshold:2` will only analyze +exported values and report those never used, used exactly once, and used exactly +twice. + +> [!Note] +> The use of an element of code is defined by its syntactic use. Consequently, +> a function could actually be called mutliple times dynamically but always at +> the same location in code and therefore be reported as used exactly once. + +The threshold argument has an ever more detailed variant : `calls:`. This +time, the subreports will not only indicate the elements used exactly `n` times +but also display the locations of their use. + +### Thresholds on optional arguments + +**options :** `-On `, `-Oa ` + +The options for optional arguments always/never used also accept a `threshold` +argument. However it looks a bit different from the one for the default reports. +Instead of using `threshold:`, there are 2 different modes available: +`percent:` and `both:,` with `0.0 <= <= 1.0` and +`0 <= `. + +Using the `percent:` argument will activate the reports for optional arguments +that are used/unused at least the provided percent of the time. + +**Examples :** +1. `dead_code_analyzer --nothing -Oa percent:0.9` will report optional arguments + always used, and those used at least 90% of the time. +2. `dead_code_analyzer --nothing -On percent:0.9` will report optional arguments + never used, and those discarded at least 90% of the time. + +Using the `both:,` argument will activate the reports for optional +arguments that are always/never used, with at most `` number of exceptions +and at least the provided percent of the time. + +**Example :** `dead_code_analyzer --nothing -Oa both:1,0.9` will report optional +arguments always used, and those unused at most once but used at least 90% of +the time. + +> [!Tip] +> Using `0` for `` provides the same effect as the `threshold:` +> argument for the default reports. +> +> **Example :** `dead_code_analyzer --nothing -Oa both:1,0` will report optional +> arguments always used, and those unused at most once. + +Similarly to the default reports, it is possible to get the callsites of the +exceptions. This is done by prefixing the threshold argument with `calls:`. + +**Example :** `dead_code_analyzer --nothing -Oa calls:both:1,0` will report optional +arguments always used, and those unused at most once with the locations where +they are discarded. + diff --git a/docs/USER_DOC.md b/docs/USER_DOC.md new file mode 100644 index 00000000..48c03ae0 --- /dev/null +++ b/docs/USER_DOC.md @@ -0,0 +1,58 @@ +# dead_code_analyzer + +## Table of contents + +- [Introduction](#introduction) +- [Topics](#topics) +- [Footnotes](#footnotes) + +## Introduction + +The `dead_code_analyzer`'s main goal is to help developers maintain their OCaml +codebase by identifying globally unused elements of code. Such elements can then +be dropped, either from the interface or from the implementation, depending on +the context. It is a static analyzer and an addition to the compiler's already +existing warnings. + +An element of code is considered unused if there is no explicit use of +it. This means that it should be removable from the codebase without changing +the semantics of the program nor breaking compilation[^breaking_compilation]. +This "_unused_" property is not considered transitively[^on_transitivity]. +It implies that each reported element can be taken care of independently. +It also implies that removing an element reported may lead to the detection of +new unused elements. + +The unused elements tracked are: +- Exported values +- Methods +- Constructors and record fields + +In addition to those unused elements, the `dead_code_analyzer` also tracks the +use of optional arguments. More specifically, it focuses on those: +- Always used +- Never used (or only to be discarded with `None`) + +Finally, as a bonus, it also identifies some stylistic issues. Although this is +not related to the _use_ of elements of code. + +[^breaking_compilation]: Actually, the compilation may break in some cases. + E.g. removing a constructor from a variant type would trigger compilation + errors on patterns matching that constructor. + +[^on_transitivity]: Elements of code used by an unused element of code are + still considered used. + Consequently, an element only used by unused elements is still considered + used. It would only be reported as unused after all its users have been + removed from the codebase. + \ + E.g. A function `f` called by another function `g` is always considered used, + no matter the status of `g`. + +## Topics + +This documentation is split accross different topics: + +- [Usage](USAGE.md) describes the usage of the `dead_code_analyzer` and its options. + + +## Footnotes