Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4585 +/- ##
==========================================
- Coverage 99.61% 99.58% -0.03%
==========================================
Files 73 73
Lines 14119 14120 +1
==========================================
- Hits 14064 14061 -3
- Misses 55 59 +4 ☔ View full report in Codecov by Sentry. |
| tt_isub = substitute(i) | ||
| tt_jsub = substitute(j) | ||
| if (!is.null(names(sys.call())) && # not relying on nargs() as it considers DT[,] to have 3 arguments, #3163 | ||
| tryCatch(!is.symbol(tt_isub), error=function(e)TRUE) && # a symbol that inherits missingness from caller isn't missing for our purpose; test 1974 |
There was a problem hiding this comment.
what errors are being caught here?
There was a problem hiding this comment.
Here's the test. In this case, cols is missing I believe.
# no error when j is supplied but inherits missingness from caller
DT = data.table(a=1:3, b=4:6)
f = function(cols) DT[,cols]
test(1974.1, f(), output="a.*b.*3:.*6")edit: I did try removing this branch but it produced errors. It's a real head scratcher but I just kept it. It's only been moved.
| # #932 related so that !(v1 == 1) becomes v1 == 1 instead of (v1 == 1) after removing "!" | ||
| if (isub %iscall% "(" && !is.name(isub[[2L]])) | ||
| if (isub %iscall% "eval") { # TO DO: or ..() | ||
| isub = eval(.massagei(isub[[2L]]), list(.N = nrow(x)), parent.frame()) |
There was a problem hiding this comment.
can we add .SD=x to envir arg here & get .SD to work in i just like that?
There was a problem hiding this comment.
I just compiled with adding .SD and success!
Note, previously .N was assigned to the parent.frame() and then restoring it if necessary. Because of that, all 4 eval calls related to processing i were largely the same.
While skipping that approach is faster, we now have to deal with associating each of the 4 eval calls with .N or whatever special variable(s) we want to use so there's a little more accounting. In theory we could have also used the previous approach to also assign .SD to the parent.frame.
There was a problem hiding this comment.
ha, came across this comment again 🙃
| if ((elem < 1 && elem != NA_INTEGER) || elem > max) stop = true; | ||
| } | ||
| } else { | ||
| #pragma omp parallel for num_threads(nth) |
There was a problem hiding this comment.
The OpenMP loop is what is missing in coverage. I am unsure - I foolishly included Rprintf("OpenMP_Loop") within the loop and during at least one of the tests, my console was full of "OpenMP_Loop" statements. That would suggest that the code coverage bot only has 1 thread, but I would have expected similar issues in #4558 as I incorporated the approach from that PR.
There was a problem hiding this comment.
The openmp is only triggered on subsetting large tables with > 10k rows.
|
This change is pretty big -- is it possible to break this PR up into smaller chunks? |
Sure - there are 3 things. The way |
sure! it looks pretty small, should be straightforward to review on its own |
|
Hey @ColeMiller1, thanks again for the PR! would you like someone else to take over splitting this up? |
Towards #3735 (maybe closes...?)
Closes this comment in the code:
dt[i, ]is around twice as fast than before.For
dt[!lgl]we see a lot of memory savings with some speed savings:CconvertNegAndZeroIdxis also faster and also includesbreakwhen threads are now 1. Also, avoiding the OpenMP when threads are set to 1 also improves performance on at least Windows.Note - there probably could be follow-up PRs related to the default number of threads (for me on only 2T, somewhere between 1E5 and 1E6 is where the break even point is). Secondly,
c(0, seq_len(1025L))is somehow faster thanseq_len(1025L)within the function with this early break. It just seems surprising that somehow removing a zero is faster than returning theindsas is.