ARROW-14231: [C++] Support casting timestamp with timezone to string#11328
ARROW-14231: [C++] Support casting timestamp with timezone to string#11328lidavidm wants to merge 4 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Shouldn't we keep some indication of that it has a time zone (was an instant) in the case of UTC as well? (otherwise a roundtrip to csv would give a native timestamp without timezone for UTC, but a timestamp with timezone (UTC) for non-UTC timezones)
There was a problem hiding this comment.
Ah, yeah, you're right. I'll fix that.
There was a problem hiding this comment.
Do we actually want to append the timezone and localize the timestamp? Pretty-printing currently doesn't.
I don't know what the user expects here (we may actually need to make this customizable at some point :-)).
There was a problem hiding this comment.
At least, Pandas does:
>>> print(pd.DataFrame({"a": [pd.Timestamp("1968-11-30 13:30:45.123456789", tz="America/Phoenix")]}).to_csv())
,a
0,1968-11-30 13:30:45.123456789-07:00
There was a problem hiding this comment.
Ah, I see. I wonder if those files are roundtrippable (is our CSV reader able to read them?).
There was a problem hiding this comment.
Nope! Good thing you pointed that out. I'll dig a bit.
/home/lidavidm/Code/upstream/arrow-14231/cpp/src/arrow/csv/converter_test.cc:68: Failure
Failed
'_error_or_value12.status()' failed with Invalid: CSV conversion error to timestamp[ns, tz=UTC]: invalid value '1970-01-01 00:00:00+0000'
There was a problem hiding this comment.
Pretty-printing currently doesn't.
Personally, I find this a "bug" in our current pretty printing (I find it rather confusing that it doesn't show at all that you have timezone-aware timestamps)
There was a problem hiding this comment.
And we have https://issues.apache.org/jira/browse/ARROW-13625 (and the linked issues) about the parsing side not yet supporting timezones
There was a problem hiding this comment.
Cool, we can tackle the reading side separately then :-)
There was a problem hiding this comment.
Thanks for digging those up!
|
Rebased on top of the recent temporal kernel refactoring. |
|
This and ARROW-12820/#11358 now pass CI and should handle both reading/writing of timestamps with timezones. |
There was a problem hiding this comment.
Nit, but static const std::string perhaps?
There was a problem hiding this comment.
Shouldn't it be "Z" rather than "+0000" for UTC timestamps?
See Wikipedia: https://en.wikipedia.org/wiki/ISO_8601#Coordinated_Universal_Time_(UTC)
There was a problem hiding this comment.
Ah good point. I've fixed this + added a UTC timestamp to the CSV writer test as well.
| for (auto string_type : {utf8(), large_utf8()}) { | ||
| ASSERT_RAISES(NotImplemented, Cast(ArrayFromJSON(timestamp(TimeUnit::SECOND, "UTC"), | ||
| "[-34226955, 1456767743]"), | ||
| CastOptions::Safe(string_type))); |
There was a problem hiding this comment.
UTC could probably be made to work on Windows, can you perhaps create a followup JIRA for that?
|
Benchmark runs are scheduled for baseline = ed8c76e and contender = f3f4423. f3f4423 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
No description provided.