frollmax2: code reorg, docs, tests #5890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

jangorecki wants to merge 16 commits into master from frollmax2

NAMESPACE

-Original file line number
+Diff line change
@@ Expand Up / @@ -51,6 +51,7 @@ S3method(cube, data.table) @@
     S3method(rollup, data.table)
     export(frollmean)
     export(frollsum)
+    export(frollmax)
     export(frollapply)
     export(nafill)
     export(setnafill)
@@ Expand Down @@

NEWS.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -296,6 +296,41 @@ @@
 . New function `%notin%` provides a convenient alternative to `!(x %in% y)`, [#4152](https://github.com/Rdatatable/data.table/issues/4152). Thanks to Jan Gorecki for suggesting and Michael Czekanski for the PR. `%notin%` uses half the memory because it computes the result directly as opposed to `!` which allocates a new vector to hold the negated result. If `x` is long enough to occupy more than half the remaining free memory, this can make the difference between the operation working, or failing with an out-of-memory error.
+. Multiple improvements has been added to rolling functions. Request came from @gpierard who needed left aligned, adaptive, rolling max, [#5438](https://github.com/Rdatatable/data.table/issues/5438). There was no `frollmax` function yet. Adaptive rolling functions did not have support for `align="left"`. `frollapply` did not support `adaptive=TRUE`. Available alternatives were base R `mapply` or self-join using `max` and grouping `by=.EACHI`. As a follow up of his request, following features has been added:
+    - new function `frollmax`, applies `max` over a rolling window.
+    - support for `align="left"` for adaptive rolling function.
+    - support for `adaptive=TRUE` in `frollapply`.
+    - better support for non-double data types in `frollapply`.
+    - better support for `Inf` and `-Inf` support in `algo="fast"` implementation.
+    - `partial` argument to trim window width to available observations rather than returning `NA` whenever window is not complete.
+    For a comprehensive description about all available features see `?froll` manual.
+    Adaptive `frollmax` has observed to be up to 50 times faster than second fastest solution (data.table self-join + `max` + `by=.EACHI`).
+    ```r
+    set.seed(108)
+    setDTthreads(8)
+    x = data.table(
+      value = cumsum(rnorm(1e6, 0.1)),
+      end_window = 1:1e6 + sample(50:500, 1e6, TRUE),
+      row = 1:1e6
+    )[, "end_window" := pmin(end_window, .N)
+      ][, "len_window" := end_window-row+1L]
+    baser = function(x) x[, mapply(function(from, to) max(value[from:to]), row, end_window)]
+    sj = function(x) x[x, max(value), on=.(row >= row, row <= end_window), by=.EACHI]$V1
+    fmax = function(x) x[, frollmax(value, len_window, adaptive=TRUE, align="left", hasNA=FALSE)]
+    microbenchmark::microbenchmark(
+      baser(x), sj(x), fmax(x),
+      times=10, check="identical"
+    )
+    #Unit: milliseconds
+    #     expr        min         lq       mean     median         uq      max neval
+    # baser(x) 4290.98557 4529.82841 4573.94115 4604.85827 4654.39342 4883.991    10
+    #    sj(x) 3600.42771 3752.19359 4118.21755 4235.45856 4329.08728 4884.080    10
+    #  fmax(x)   64.48627   73.07978   88.84932   76.64569   82.56115  198.438    10
+    ```
     ## BUG FIXES
 . `by=.EACHI` when `i` is keyed but `on=` different columns than `i`'s key could create an invalidly keyed result, [#4603](https://github.com/Rdatatable/data.table/issues/4603) [#4911](https://github.com/Rdatatable/data.table/issues/4911). Thanks to @myoung3 and @adamaltmejd for reporting, and @ColeMiller1 for the PR. An invalid key is where a `data.table` is marked as sorted by the key columns but the data is not sorted by those columns, leading to incorrect results from subsequent queries.
@@ Expand Down @@

R/froll.R

-Original file line number
+Diff line change
@@ Expand Up @@
       stopifnot(!missing(fun), is.character(fun), length(fun)==1L, !is.na(fun))
       algo = match.arg(algo)
       align = match.arg(align)
+      leftadaptive = isTRUE(adaptive) && align=="left"  ## support for left added in #5441
+      if (leftadaptive) {
+        rev2 = function(x) if (is.list(x)) sapply(x, rev, simplify=FALSE) else rev(x)
+        verbose = getOption("datatable.verbose")
+        if (verbose)
+          cat("froll: adaptive=TRUE && align='left' pre-processing for align='right'\n")
+        x = rev2(x)
+        n = rev2(n)
+        align = "right"
+      }
       ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, hasNA, adaptive)
-      ans
+      if (!leftadaptive)
+        ans
+      else {
+        if (verbose)
+          cat("froll: adaptive=TRUE && align='left' post-processing from align='right'\n")
+        rev2(ans)
+      }
     }
     frollmean = function(x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
@@ Expand All @@
     frollsum = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
       froll(fun="sum", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive)
     }
-    frollapply = function(x, n, FUN, ..., fill=NA, align=c("right", "left", "center")) {
+    frollmax = function(x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
+      froll(fun="max", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive)
+    }
+    frollapply = function(x, n, FUN, ..., fill=NA, align=c("right", "left", "center"), adaptive) {
       FUN = match.fun(FUN)
       align = match.arg(align)
+      if (!missing(adaptive))
+        stopf("frollapply does not support 'adaptive' argument")
       rho = new.env()
       ans = .Call(CfrollapplyR, FUN, x, n, fill, align, rho)
       ans
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

frollmax2: code reorg, docs, tests #5890

Uh oh!

Diff view

Diff view

There are no files selected for viewing

jangorecki Jan 9, 2024 •

edited

Loading

Uh oh!

Uh oh!

frollmax2: code reorg, docs, tests #5890

Uh oh!

frollmax2: code reorg, docs, tests #5890

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

jangorecki Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jangorecki Jan 9, 2024 •

edited

Loading