Q - incorporate faster (though inexact due to leap centuries) datetime functions?

I have the following functions in my package that we might consider adding to `data.table`:  `quick_year`, `quick_yday`, `quick_mday` (code [here](https://github.com/MichaelChirico/funchir/blob/master/R/utils.R#L129-L151)).

Till now I haven't filed these as a PR because they're not _perfectly_ correct -- any user that deals with dates before March 1st, 1900 or after February 28, 2100 will get incorrect results due to [leap centuries](https://en.wikipedia.org/wiki/Century_leap_year). 

Only now I considered the possibility of adding these methods to `data.table`, but including an argument (say, `exact = TRUE`, ?activated by default?) that would default to the exact methods for users dealing with distant dates; users dealing with (currently) more typical dates would get the benefit of these faster methods.

To get a sense for the improvement, here's a benchmark:

```
NN = 1e6
# sample between Jan 1, 1901 and Dec 31, 2099
sample_dates = as.IDate(sample(-25202:47481, NN, replace = TRUE),
                        origin = '1970-01-01')

library(microbenchmark)
library(data.table)
library(funchir) #must be installed from my GitHub for now
microbenchmark(times = 100L,
               year_exact_TRUE = year(sample_dates),
               year_exact_FALSE = quick_year(sample_dates),
               yday_exact_TRUE = yday(sample_dates),
               yday_exact_FALSE = quick_yday(sample_dates),
               mday_exact_TRUE = mday(sample_dates),
               mday_exact_FALSE = quick_mday(sample_dates))
# Unit: milliseconds
#              expr      min       lq     mean   median       uq       max
#   year_exact_TRUE 20.64397 57.99979 62.94209 68.10336 70.69258  87.86857
#  year_exact_FALSE 37.25068 38.42864 40.07562 39.44084 40.68865  86.23708
#   yday_exact_TRUE 20.15585 56.33888 61.68985 64.37842 69.01061 135.93603
#  yday_exact_FALSE 29.93036 31.09082 33.88315 31.85358 32.78074  80.44249
#   mday_exact_TRUE 19.05430 54.83927 57.80232 65.01923 67.61623  82.47341
#  mday_exact_FALSE 16.10283 16.88485 18.81903 17.38729 17.90674  65.08335
```

So a 43% improvement, 52% improvement, and a 74% improvement over current methods.

And to confirm:

```
identical(year(sample_dates), quick_year(sample_dates))
# [1] TRUE
identical(yday(sample_dates), quick_yday(sample_dates))
# [1] TRUE
identical(mday(sample_dates), quick_mday(sample_dates))
# [1] TRUE
```

Any thoughts? I would also lean towards setting the default to use the faster method, but this is a potentially breaking change. Could also consider checking whether the underlying integer is inside the correct boundary and then branching from there, or consider expanding this to just work for all dates forever...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q - incorporate faster (though inexact due to leap centuries) datetime functions? #2463

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Q - incorporate faster (though inexact due to leap centuries) datetime functions? #2463

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions