GH-40891: [JS] Store Dates as TimestampMillisecond#40892
GH-40891: [JS] Store Dates as TimestampMillisecond#40892domoritz merged 4 commits intoapache:mainfrom
Conversation
js/src/type.ts
Outdated
| interface Timestamp_<T extends Timestamps = Timestamps> extends DataType<T> { | ||
| TArray: Int32Array; | ||
| TValue: number; | ||
| TValue: number | Date; |
There was a problem hiding this comment.
Can't the TValue type be overridden just for the TimestampMillisecond class below?
There was a problem hiding this comment.
Reverted to number again.
trxcllnt
left a comment
There was a problem hiding this comment.
Is there a reason to change the type from number to Date? IIRC it's on users to convert numbers to the corresponding timezone, so returning number | Date makes the type-safe way to do that more difficult for users.
If we always return numbers, users can convert to Dates (if necessary) themselves. IIRC constructing date instances is relatively expensive esp. if using custom locales, so users should opt-in to that behavior if they need it?
|
We do construct dates for DateMillisecond already so I figured it would make sense to do the same for TimestampMillisecond.
I did not measure it but wit surprises me a bit since I expected it to just be a lightweight wrapper around timestamps. I'd be okay with just retuning numbers to be fast but also to introduce an iterator for vectors that returns convenient types in JS that may be slower but easier to work with (in a follow up PR). Creating a million Dates takes about 5ms in bun on my machine (M1 Mac). I node, 40ms! Getting 1M Dates from a TimestampMillisecond vector instead of timestamps takes 26 instead of 15ms (in bun on my machine). In node 54ms instead of 14ms. So yeah, much slower in node. |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 2caec86. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Merge after apache#40892. This pull request also changes Dates to return timestamps instead of Date instances (similar to Timestamps and for the same reason. * GitHub Issue: apache#40959
Fixes apache#40891 Tested with ```ts const date = new Date("2023-03-29T12:34:56Z"); console.log("original", date) console.log("=> vec") const vec = arrow.vectorFromArray([date]) console.log(vec.toArray()) console.log(vec.toJSON()) console.log(vec.type) console.log(vec.get(0)) console.log("=> vec2") const vec2 = arrow.vectorFromArray([date], new arrow.DateMillisecond) console.log(vec2.toArray()) console.log(vec.toJSON()) console.log(vec2.type) console.log(vec2.get(0)) console.log("=> table") const table = arrow.tableFromJSON([{ date }]) console.log(table.toArray()) console.log(table.schema.fields[0].type) console.log(table.getChildAt(0)?.get(0)) console.log("=> table2") const table2 = arrow.tableFromIPC(arrow.tableToIPC(table)); console.log(table2.toArray()) console.log(table2.schema.fields[0].type) console.log(table2.getChildAt(0)?.get(0)) console.log("=> table3") const table3 = new arrow.Table({ dates: vec2 }) console.log(table3.toArray()) console.log(table3.schema.fields[0].type) console.log(table3.getChildAt(0)?.get(0)) ``` ``` => table [ {"date": Wed Mar 29 2023 08:34:56 GMT-0400 (Eastern Daylight Time)} ] TimestampMillisecond { typeId: 10, unit: 1, timezone: undefined, toString: [Function: toString], ArrayType: [class Int32Array], [Symbol(Symbol.toStringTag)]: "Timestamp", children: null, OffsetArrayType: [class Int32Array], } 2023-03-29T12:34:56.000Z => table2 [ {"date": Wed Mar 29 2023 08:34:56 GMT-0400 (Eastern Daylight Time)} ] Timestamp_ { typeId: 10, unit: 1, timezone: null, toString: [Function: toString], ArrayType: [class Int32Array], children: null, OffsetArrayType: [class Int32Array], } 2023-03-29T12:34:56.000Z => table3 [ {"dates": Wed Mar 29 2023 08:34:56 GMT-0400 (Eastern Daylight Time)} ] DateMillisecond { typeId: 8, unit: 1, toString: [Function: toString], ArrayType: [class Int32Array], [Symbol(Symbol.toStringTag)]: "Date", children: null, OffsetArrayType: [class Int32Array], } 2023-03-29T12:34:56.000Z ``` * GitHub Issue: apache#40891
Merge after apache#40892. This pull request also changes Dates to return timestamps instead of Date instances (similar to Timestamps and for the same reason. * GitHub Issue: apache#40959
Fixes #40891
Tested with