Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ S3method(cube, data.table)
S3method(rollup, data.table)
export(frollmean)
export(frollsum)
export(frollmax)
export(frollapply)
export(nafill)
export(setnafill)
Expand Down
35 changes: 35 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,41 @@

41. New function `%notin%` provides a convenient alternative to `!(x %in% y)`, [#4152](https://github.com/Rdatatable/data.table/issues/4152). Thanks to Jan Gorecki for suggesting and Michael Czekanski for the PR. `%notin%` uses half the memory because it computes the result directly as opposed to `!` which allocates a new vector to hold the negated result. If `x` is long enough to occupy more than half the remaining free memory, this can make the difference between the operation working, or failing with an out-of-memory error.

42. Multiple improvements has been added to rolling functions. Request came from @gpierard who needed left aligned, adaptive, rolling max, [#5438](https://github.com/Rdatatable/data.table/issues/5438). There was no `frollmax` function yet. Adaptive rolling functions did not have support for `align="left"`. `frollapply` did not support `adaptive=TRUE`. Available alternatives were base R `mapply` or self-join using `max` and grouping `by=.EACHI`. As a follow up of his request, following features has been added:
- new function `frollmax`, applies `max` over a rolling window.
- support for `align="left"` for adaptive rolling function.
- support for `adaptive=TRUE` in `frollapply`.
- better support for non-double data types in `frollapply`.
- better support for `Inf` and `-Inf` support in `algo="fast"` implementation.
- `partial` argument to trim window width to available observations rather than returning `NA` whenever window is not complete.
Copy link
Member Author

@jangorecki jangorecki Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some points here refer to future PRs


For a comprehensive description about all available features see `?froll` manual.

Adaptive `frollmax` has observed to be up to 50 times faster than second fastest solution (data.table self-join + `max` + `by=.EACHI`).
```r
set.seed(108)
setDTthreads(8)
x = data.table(
value = cumsum(rnorm(1e6, 0.1)),
end_window = 1:1e6 + sample(50:500, 1e6, TRUE),
row = 1:1e6
)[, "end_window" := pmin(end_window, .N)
][, "len_window" := end_window-row+1L]

baser = function(x) x[, mapply(function(from, to) max(value[from:to]), row, end_window)]
sj = function(x) x[x, max(value), on=.(row >= row, row <= end_window), by=.EACHI]$V1
fmax = function(x) x[, frollmax(value, len_window, adaptive=TRUE, align="left", hasNA=FALSE)]
microbenchmark::microbenchmark(
baser(x), sj(x), fmax(x),
times=10, check="identical"
)
#Unit: milliseconds
# expr min lq mean median uq max neval
# baser(x) 4290.98557 4529.82841 4573.94115 4604.85827 4654.39342 4883.991 10
# sj(x) 3600.42771 3752.19359 4118.21755 4235.45856 4329.08728 4884.080 10
# fmax(x) 64.48627 73.07978 88.84932 76.64569 82.56115 198.438 10
```

## BUG FIXES

1. `by=.EACHI` when `i` is keyed but `on=` different columns than `i`'s key could create an invalidly keyed result, [#4603](https://github.com/Rdatatable/data.table/issues/4603) [#4911](https://github.com/Rdatatable/data.table/issues/4911). Thanks to @myoung3 and @adamaltmejd for reporting, and @ColeMiller1 for the PR. An invalid key is where a `data.table` is marked as sorted by the key columns but the data is not sorted by those columns, leading to incorrect results from subsequent queries.
Expand Down
25 changes: 23 additions & 2 deletions R/froll.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,24 @@ froll = function(fun, x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "
stopifnot(!missing(fun), is.character(fun), length(fun)==1L, !is.na(fun))
algo = match.arg(algo)
align = match.arg(align)
leftadaptive = isTRUE(adaptive) && align=="left" ## support for left added in #5441
if (leftadaptive) {
rev2 = function(x) if (is.list(x)) sapply(x, rev, simplify=FALSE) else rev(x)
verbose = getOption("datatable.verbose")
if (verbose)
cat("froll: adaptive=TRUE && align='left' pre-processing for align='right'\n")
x = rev2(x)
n = rev2(n)
align = "right"
}
ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, hasNA, adaptive)
ans
if (!leftadaptive)
ans
else {
if (verbose)
cat("froll: adaptive=TRUE && align='left' post-processing from align='right'\n")
rev2(ans)
}
}

frollmean = function(x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
Expand All @@ -12,9 +28,14 @@ frollmean = function(x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "l
frollsum = function(x, n, fill=NA, algo=c("fast","exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
froll(fun="sum", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive)
}
frollapply = function(x, n, FUN, ..., fill=NA, align=c("right", "left", "center")) {
frollmax = function(x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) {
froll(fun="max", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive)
}
frollapply = function(x, n, FUN, ..., fill=NA, align=c("right", "left", "center"), adaptive) {
FUN = match.fun(FUN)
align = match.arg(align)
if (!missing(adaptive))
stopf("frollapply does not support 'adaptive' argument")
rho = new.env()
ans = .Call(CfrollapplyR, FUN, x, n, fill, align, rho)
ans
Expand Down
Loading