GH-35633: [R] R builds failing with error 'Invalid: Timestamps already have a timezone: 'UTC'. Cannot localize to 'UTC''#35671
Conversation
|
Benchmark runs are scheduled for baseline = dd33cab and contender = 9039ee2. 9039ee2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
['Python', 'R'] benchmarks have high level of regressions. |
| # timezone-aware timestamps, not for timezone-naive ones. | ||
| # strptime in Acero will return a timezone-aware timestamp if %z is | ||
| # part of the format string. | ||
| if (!is.null(tz) && !grepl("%z", format, fixed = TRUE)) { |
There was a problem hiding this comment.
I don't understand how timezones work in R (and with POSIXlt and POSIXct), but with the above change, the tz keyword will just be ignored when the string includes an offset?
That doesn't match fully what base R does:
> strptime("2020-05-01T00:00+0100", format="%Y-%m-%dT%H:%M%z")
[1] "2020-05-01 01:00:00"
> strptime("2020-05-01T00:00+0100", format="%Y-%m-%dT%H:%M%z", tz="US/Eastern")
[1] "2020-04-30 19:00:00"
(but again, I don't really know how to interpret those returned values, and whether they are tz "aware" or "native" or even if those concepts even exist in R. I looked at the $zone and $gmtoff attributes of the above return values, but can't make sense of it)
There was a problem hiding this comment.
We don't have a naive datetime type in R...everything is timezone aware, and tz = "" means "your local timezone" (but everything is UTC internally). What you've encountered here is a fun type called the POSIXlt which is sort of like a data frame that stores components in separate vectors (the POSIXct is a more normal version of it, which is seconds from the unix epoch as a double vector).
It's a good point that this doesn't match what base R does...I think we'd want to cast() if grepl("%z") and assume_timezone() otherwise?
There was a problem hiding this comment.
I don't know if this helps with understanding dates/times in R, but the tz argument won't affect the underlying point in time (just the attribute of the output and, to great confusion of many, how it is printed):
unclass(as.POSIXct(strptime("2020-05-01T00:00+0100", format="%Y-%m-%dT%H:%M%z")))
#> [1] 1588287600
#> attr(,"tzone")
#> [1] ""
unclass(as.POSIXct(strptime("2020-05-01T00:00+0100", tz = "UTC", format="%Y-%m-%dT%H:%M%z")))
#> [1] 1588287600
#> attr(,"tzone")
#> [1] "UTC"
unclass(as.POSIXct(strptime("2020-05-01T00:00+0100", tz = "America/Halifax", format="%Y-%m-%dT%H:%M%z")))
#> [1] 1588287600
#> attr(,"tzone")
#> [1] "America/Halifax"Created on 2023-05-24 with reprex v2.0.2
There was a problem hiding this comment.
It's a good point that this doesn't match what base R does...I think we'd want to
cast()ifgrepl("%z")andassume_timezone()otherwise?
Yes, that sounds correct
Alternative approach to #35634.