GH-34386: [C++] Add a PathFromUriOrPath method#34420
Conversation
|
|
|
Oh neat, thank you. I'll try to take a look at this soon |
|
Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍 |
|
Not sure why this was unceremoniously closed. Weston, do you think you would still have time to look at this? It would be useful in #35034 |
|
Yes, I will try and look at this. I think it still needs some changes. |
79dc526 to
9100358
Compare
|
I unified the various paths using a helper method in util_internal.cc. This highlighted one small difference between the various implementations. The s3 and gcs filesystems remove any trailing slash in the path while the hdfs and local filesystems do not. I have opened #35399 for any discussion on that difference but, for the moment, I have changed all the filesystems to always remove the trailing slash for consistency. |
|
Hmm, I suppose the trailing slash could change listing behavior in S3. |
S3 was already removing the trailing slash so there is no change to the S3 behavior. This only affected the local filesystem. |
|
Ah, I got that mixed up. LGTM then |
…u already have a filesystem
… option to true everywhere it is used as part of PathFromUri
da56f29 to
c89e9ca
Compare
| #ifdef _WIN32 | ||
| if (preserve_root && key.size() == 3 && key[1] == ':' && key[0] != '/') { | ||
| // If the user gives us C:/ then don't return C: | ||
| return key; | ||
| } | ||
| #endif |
|
There are a few errors but they seem to be unrelated. I'm going to merge this. |
|
Benchmark runs are scheduled for baseline = febd0ff and contender = 7ca7724. 7ca7724 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
['Python', 'R'] benchmarks have high level of regressions. |
### Rationale for this change We have some URI parsing indirectly exposed through FilesystemFromUri. However, this isn't very useful if the user has multiple URIs or if they already have a filesystem. This method allows that same URI handling to be used even if the user already has a filesystem. ### What changes are included in this PR? Adds a new arrow::fs::PathFromUriOrPath method ### Are these changes tested? Yes, via unit tests ### Are there any user-facing changes? There is a new API method but no changes to any existing APIs * Closes: apache#34386 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
### Rationale for this change We have some URI parsing indirectly exposed through FilesystemFromUri. However, this isn't very useful if the user has multiple URIs or if they already have a filesystem. This method allows that same URI handling to be used even if the user already has a filesystem. ### What changes are included in this PR? Adds a new arrow::fs::PathFromUriOrPath method ### Are these changes tested? Yes, via unit tests ### Are there any user-facing changes? There is a new API method but no changes to any existing APIs * Closes: apache#34386 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
Rationale for this change
We have some URI parsing indirectly exposed through FilesystemFromUri. However, this isn't very useful if the user has multiple URIs or if they already have a filesystem. This method allows that same URI handling to be used even if the user already has a filesystem.
What changes are included in this PR?
Adds a new arrow::fs::PathFromUriOrPath method
Are these changes tested?
Yes, via unit tests
Are there any user-facing changes?
There is a new API method but no changes to any existing APIs