-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
In an effort to potentially address #4687, I created a GitHub Action to facilitate performance testing of the incoming changes that are introduced via pull requests to data.table. My motivation in doing so was to help ensure that data.table maintains its code efficiency or high-performance standards (core values of the project!) as PRs keep coming and getting integrated frequently (meaning they need to be monitored for performance regressions, and an automatic way to do that would be ideal, noh?).
Onto more details:
- My action utilizes atime to run predefined tests (as specified in
inst/atime/tests.R) on different versions ofdata.table. These 'tests' are based on historical regressions that have been documented/discussed via issues and PRs here. - It uses cml to publish information in a comment on the pull request thread via the GitHub bot (which operates using the
GITHUB_TOKENI provide to authenticate itself and interact with the GitHub API within the scope of this step that I defined in my GHA workflow). - Contents of the comment include a plot consisting of subplots for each test case depicting the time and memory trends across different
data.tableversions, the commit (SHA) that generated it, the link to download the artifact containing all of theatime-generated results for that run, the time elapsed from the job's start to the completion of the installation steps, and the time it took to run all the tests. - To avoid cluttering, only one comment will exist in the PR thread at all times (after the initial comment is generated from the first commit on the PR branch, subsequent pushes will simply update that existing comment's body).
- The different versions of
data.table(as defined withinatime) include:- HEAD (PR source)
- base (PR target)
- merge-base (common ancestor between base and HEAD)
- CRAN (latest version on the platform)
- Before (a commit predating the onset of the performance regression)
- Regression (a commit that is affected by the regression)
- Fixed (a commit where the regression has been fixed; this should be the same or better as 'Before' in terms of performance)
To demonstrate, I've created two examples on my fork of data.table that can be compared against each other:
- Using my GitHub Action to observe the performance regression introduced by PR #4491 and fixed by PR #5463
- Using my GitHub Action to observe the performance regression fixed by PR #4440
Both of them replicate PRs that are associated with historical data.table regressions (introducing/fixing one) and showcase the expected results in their respective plots:
-
For the test case titled 'Test regression fixed in #5463' one can see that base, merge-base, CRAN, Fixed, and Before are fast while Regression and HEAD are slow for the first PR that replicates the regression introducing PR, and in the second PR where it should not be affected, all commits are fast with the expected exception of Regression.
-
For 'Test regression fixed in #4440', the first PR illustrates the reverse case with all commits being fast except the one associated with Regression (as expected since that PR should not affect that test case), and in the second PR where it replicates the regression fixing PR, we see that Fixed, HEAD, CRAN, and Before are fast while Regression, merge-base, and base are slow as we would expect.
Note that the first PR's GHA workflow uses my action from the Marketplace while the second one is run via a self-contained version of my workflow, or with the entire logic in the file under .github/workflows/ within the target branch. (I made them different on purpose to illustrate the two ways!)
So a question that I have here is: Which among these two approaches would be preferable if I make a PR to integrate my workflow?
The first approach allows for more direct customization while the second requires comparatively less maintenance (as I update things for my action, only the version numbers need to be changed here).
Another question I have is regarding the duration.
The total time taken to run the CI job can be roughly seen as an aggregation of two fairly time-consuming tasks:
- The time it takes to set up everything that is required. A core chunk of the time here is taken in installing
atimeand its dependencies. Roughly 12 minutes on average. - The time it takes to run the tests, which would vary based on the number and complexity of them. The two tests that were used in the examples above take roughly 3 minutes on average.
Is this acceptable in terms of how long it takes to run the workflow? (and I presume we can skip running the action always; for e.g., changes that only deal with documentation)
Finally, I want to mention that I made this action specifically for data.table (again, in hopes that this helps to detect regressions and maintain the quality of code contributions performance-wise!), but it also works on any other GitHub repository for an R package. For example, last month I made a PR between branches on my fork of binsegRcpp that introduces a performance regression (intentionally placed Sys.sleep()) in one of the R files, and it was depicted in the commented plot. (Note that it was run using an older version of my action's code)
Ready for feedback!