A GitHub Action to perform automated performance regression testing on pull requests

In an effort to potentially address #4687, I created a [GitHub Action](https://github.com/Anirban166/Autocomment-atime-results/blob/main/action.yml) to facilitate performance testing of the incoming changes that are introduced via pull requests to `data.table`. My motivation in doing so was to help ensure that `data.table` maintains its code efficiency or high-performance standards (core values of the project!) as PRs keep coming and getting integrated frequently (meaning they need to be monitored for performance regressions, and an automatic way to do that would be ideal, noh?).

Onto more details:
- My action utilizes [atime](https://github.com/tdhock/atime) to run predefined tests (as specified in `inst/atime/tests.R`) on different versions of `data.table`. These 'tests' are based on historical regressions that have been documented/discussed via issues and PRs here.
- It uses [cml](https://github.com/iterative/cml) to publish information in a comment on the pull request thread via the GitHub bot (which operates using the `GITHUB_TOKEN` I provide to authenticate itself and interact with the GitHub API within the scope of this step that I defined in my GHA workflow).
- Contents of the comment include a plot consisting of subplots for each test case depicting the time and memory trends across different `data.table` versions, the commit (SHA) that generated it, the link to download the artifact containing all of the `atime`-generated results for that run, the time elapsed from the job's start to the completion of the installation steps, and the time it took to run all the tests.
- To avoid cluttering, only one comment will exist in the PR thread at all times (after the initial comment is generated from the first commit on the PR branch, subsequent pushes will simply update that existing comment's body).
- The different versions of `data.table` (as defined within `atime`) include:
  - HEAD (PR source)
  - base (PR target)
  - merge-base (common ancestor between base and HEAD)
  - CRAN (latest version on the platform)
  - Before (a commit predating the onset of the performance regression)
  - Regression (a commit that is affected by the regression)
  - Fixed (a commit where the regression has been fixed; this should be the same or better as 'Before' in terms of performance)  

To demonstrate, I've created two examples on my fork of `data.table` that can be compared against each other:
- [Using my GitHub Action to observe the performance regression introduced by PR #4491 and fixed by PR #5463](https://github.com/Anirban166/data.table/pull/6)
- [Using my GitHub Action to observe the performance regression fixed by PR #4440](https://github.com/Anirban166/data.table/pull/8)

Both of them replicate PRs that are associated with historical `data.table` regressions (introducing/fixing one) and showcase the expected results in their respective plots:

- For the test case titled 'Test regression fixed in #​5463' one can see that base, merge-base, CRAN, Fixed, and Before are fast while Regression and HEAD are slow for the first PR that replicates the regression *introducing* PR, and in the second PR where it should not be affected, all commits are fast with the expected exception of Regression.

- For 'Test regression fixed in #​4440', the first PR illustrates the reverse case with all commits being fast except the one associated with Regression (as expected since that PR should not affect that test case), and in the second PR where it replicates the regression *fixing* PR, we see that Fixed, HEAD, CRAN, and Before are fast while Regression, merge-base, and base are slow as we would expect.

Note that the first PR's GHA workflow uses my action from the [Marketplace](https://github.com/marketplace/actions/autocomment-atime-results) while the second one is run via a self-contained version of my workflow, or with the entire logic in the file under `.github/workflows/` within the target branch. (I made them different on purpose to illustrate the two ways!)

So a question that I have here is: Which among these two approaches would be preferable if I make a PR to integrate my workflow?

The first approach allows for more direct customization while the second requires comparatively less maintenance (as I update things for my action, only the version numbers need to be changed here). 

Another question I have is regarding the duration. 

The total time taken to run the CI job can be roughly seen as an aggregation of two fairly time-consuming tasks:
- The time it takes to set up everything that is required. A core chunk of the time here is taken in installing `atime` and its dependencies. Roughly 12 minutes on average.
- The time it takes to run the tests, which would vary based on the number and complexity of them. The two tests that were used in the examples above take roughly 3 minutes on average.

Is this acceptable in terms of how long it takes to run the workflow? (and I presume we can skip running the action always; for e.g., changes that only deal with documentation)

Finally, I want to mention that I made this action specifically for `data.table` (again, in hopes that this helps to detect regressions and maintain the quality of code contributions performance-wise!), but it also works on any other GitHub repository for an R package. For example, last month I made a [PR](https://github.com/Anirban166/binsegRcpp/pull/2) between branches on my fork of [binsegRcpp](https://github.com/Anirban166/binsegRcpp) that introduces a performance regression (intentionally placed `Sys.sleep()`) in one of the R files, and it was depicted in the commented plot. (Note that it was run using an older version of my action's code)

Ready for feedback!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A GitHub Action to perform automated performance regression testing on pull requests #6065

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A GitHub Action to perform automated performance regression testing on pull requests #6065

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions