Skip to content

Comments

fcase support scalar condition, vectorize default and lazy-eval default#4264

Merged
MichaelChirico merged 52 commits intomasterfrom
fix4258
Jul 29, 2024
Merged

fcase support scalar condition, vectorize default and lazy-eval default#4264
MichaelChirico merged 52 commits intomasterfrom
fix4258

Conversation

@shrektan
Copy link
Member

@shrektan shrektan commented Feb 27, 2020

Closes #4258

Three features:

  1. Scalar condition
  2. Vectorized default value
  3. Lazy-eval default value

In short, the default should be regarded as just a special condition-value pair argument. So it should be vectorized and lazily evaluated.

I believe with the three features the fcase() interface will have a closer-natural-fits to what our users expect.

@jangorecki
Copy link
Member

would be nice to measure overhead as I see some of the constant values declaration to be moved into loops.

@codecov
Copy link

codecov bot commented Feb 27, 2020

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.53%. Comparing base (566bff0) to head (b2c9f2e).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4264      +/-   ##
==========================================
- Coverage   97.53%   97.53%   -0.01%     
==========================================
  Files          80       80              
  Lines       14915    14909       -6     
==========================================
- Hits        14547    14541       -6     
  Misses        368      368              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shrektan
Copy link
Member Author

shrektan commented Feb 27, 2020

The impact should be minimal when comparing to the scalar default.

library(data.table)
set.seed(100)
n <- 1e5
cond1 <- sample(c(T, F, NA), n, replace = TRUE)
cond2 <- sample(c(T, F, NA), n, replace = TRUE)
val1 <- rnorm(n)
val2 <- rnorm(n)
default <- rnorm(n)
val1_str <- as.character(val1)
val2_str <- as.character(val2)
val1_list <- as.list(val1)
val2_list <- as.list(val2)
microbenchmark::microbenchmark(
  fcase(cond1, val1, cond2, val2, default = 0.0),
  fcase(cond1, val1_str, cond2, val2_str, default = 'abc'),
  fcase(cond1, val1_list, cond2, val2_list, default = list(1:2)),
  times = 1e3
)
Results (updated)

Before the PR

Unit: microseconds
                                                           expr      min        lq     mean   median
                   fcase(cond1, val1, cond2, val2, default = 0)  594.076  954.0095 1525.554 1072.528
       fcase(cond1, val1_str, cond2, val2_str, default = "abc") 1475.925 1896.3675 2938.430 2008.738
 fcase(cond1, val1_list, cond2, val2_list, default = list(1:2)) 1298.675 1691.7195 2428.085 1793.802
       uq       max neval
 1157.206  31175.67  1000
 2150.023 233463.10  1000
 1903.865 115293.37  1000

After the PR

Unit: microseconds
                                                           expr      min       lq     mean   median
                   fcase(cond1, val1, cond2, val2, default = 0)  745.148 1157.414 1506.076 1210.990
       fcase(cond1, val1_str, cond2, val2_str, default = "abc") 1759.637 2191.229 3077.457 2290.721
 fcase(cond1, val1_list, cond2, val2_list, default = list(1:2)) 1266.973 1635.731 2191.297 1706.582
       uq       max neval
 1307.318  34389.14  1000
 2468.971 240898.93  1000
 1825.561 168851.25  1000

@2005m

This comment has been minimized.

@MichaelChirico

This comment has been minimized.

@shrektan

This comment has been minimized.

@shrektan
Copy link
Member Author

Looks good for recycling default!

But the issue is a bit more general -- this still fails:

fcase(
  iris$Sepal.Length > 5, ">5",
  iris$Sepal.Length < 4, "<4",
  TRUE, as.character(iris$Sepal.Length)
)

Not sure if it's desired since we've already had the default argument. But for the dplyr-syntax-familiar users, having this feature seems good.

@2005m
Copy link
Contributor

2005m commented Feb 27, 2020

@shrektan , yes to your first question. I can send you the code tonight or commit something if you want. I just need to write it but it is easy. Regarding your second question, yes it is for long type. Look at line 43 how it is done for fifelse.

@2005m

This comment has been minimized.

@shrektan

This comment has been minimized.

@jangorecki

This comment has been minimized.

@shrektan
Copy link
Member Author

shrektan commented Feb 28, 2020

Those new changes seems to invalidate benchmark. If there is any difference, does this benchmark address the most pessimistic scenario?

Yes. There's an overhead of always copying the whole vector of the default arg.

In addition, after adding the supports for the scalar condition, I just realized the default argument may not be really necessary (I'm pointing to the na argument in C level. We can still keep the R signature).

I think we should just convert the R function's default into a condition = TRUE, value = default. The benefits are:

  • The default will also be lazy evaluated
    At present, it's always evaluated. It means fcase(TRUE, 1, default = stop("unexpected")) will throw errors which is unexpected for me.
  • We can avoid the cost of evaluation and copying of the default vector
  • Maybe simpler the C's implementation
  • fcase(c(T,F), 1, c(T,T), NA) will become valid
    At present, we have to use fcase(c(T,F), 1, c(T,T), NA_real_)

How do you think?


UPDATE: I think it makes sense so I'm starting to implement this.

@shrektan
Copy link
Member Author

DONE. The one thing I'm not satisfied with is kind of the lines below. I have to declare a pointer w/o initialized it. The code is OK but if there's a better way to avoid this, I'm happy to make the change.

const int *restrict pouts;
if (!naout) pouts = LOGICAL(outs); // the content is not useful if out is NA_LOGICAL scalar

@shrektan shrektan changed the title fcase support vector default value fcase support vector default value and scalar condition Feb 28, 2020
@shrektan shrektan changed the title fcase support vector default value and scalar condition fcase support scalar condition, vectorize default and lazy-eval default Feb 28, 2020
@2005m
Copy link
Contributor

2005m commented Feb 28, 2020

I'll be honest I don't really like this scalar TRUE condition. I find it confusing and not clear. I prefer default. I am not sure how close we want to be to dplyr. Is dyplr dictating what the standard is?

@shrektan
Copy link
Member Author

shrektan commented Feb 28, 2020

But I can’t see any drawback for providing this, either. We can still use default. Anyway, my point is less about the scalar condition, but by providing this, we can naturally have the lazy evaluated default, which I believe is a nice feature to have.

@github-actions
Copy link

github-actions bot commented Jul 11, 2024

Comparison Plot

Generated via commit a653106

Download link for the artifact containing the test results: ↓ atime-results.zip

Time taken to finish the standard R installation steps: 11 minutes and 48 seconds

Time taken to run atime::atime_pkg on the tests: 3 minutes and 49 seconds

@MichaelChirico MichaelChirico added this to the 1.16.0 milestone Jul 15, 2024
@MichaelChirico
Copy link
Member

@ben-schwen think you can take a pass at review here?

Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>
@ben-schwen
Copy link
Member

ben-schwen commented Jul 29, 2024

Basically LGTM. However, there are many things I would want to rename/remodel for clarity purposes. Right now, I find this piece of our codebase quite hard to grasp in comparison to its complexity.

@MichaelChirico
Copy link
Member

Right now, I find this piece of our codebase quite hard to grasp in comparison to its complexity.

I think the same can be said of a lot of our codebase TBH 😆

I think adding a lot of performance regression tests is the most important piece here, I always worry about accidentally introducing a slowdown when refactoring, even if our great test suite is passing.

Filed #6322.

@MichaelChirico
Copy link
Member

Thanks @shrektan!!

@MichaelChirico MichaelChirico merged commit 3c19d4d into master Jul 29, 2024
@MichaelChirico MichaelChirico deleted the fix4258 branch July 29, 2024 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shouldn't fcase() recycle?

8 participants