GH-40698: [C++] Create registry for Devices to map DeviceType to MemoryManager in C Device Data import#40699
Conversation
…o MemoryManager in C Device Data import
|
|
|
@pitrou I would appreciate a preliminary review to check if this is going in the right direction (of course still need to add tests, docs, clean-up naming, etc, and testing it now with CUDA) For now I didn't go for a dual public / And I added it to device.h/cc right now, but actually if this will only be used for the C Device interface, could also move it to bridge.h/cc |
|
@github-actions crossbow submit test-cuda-python |
|
Revision: e16e24d Submitted crossbow builds: ursacomputing/crossbow @ actions-3c9b11581b
|
Yes, this is looking ok, though the implementation can be simplified a bit.
Agreed. No need to expose the registry class itself for now.
Since the API is minimal and doesn't require any addition includes, we can keep it in |
You mean to not even use the internal class to store the mapping, but just have the register/get functions and store the unordered_map in a global variable? |
|
No, the class is ok, but the |
|
@github-actions crossbow submit test-cuda-python |
|
Revision: f33872d Submitted crossbow builds: ursacomputing/crossbow @ actions-2f2ec1b2f6
|
cpp/src/arrow/gpu/cuda_memory.cc
Outdated
| Result<std::shared_ptr<MemoryManager>> DefaultGPUMemoryMapper(int64_t device_id) { | ||
| ARROW_ASSIGN_OR_RAISE(auto device, arrow::cuda::CudaDevice::Make(device_id)); | ||
| return device->default_memory_manager(); | ||
| } |
There was a problem hiding this comment.
We should probably include some sort of customizations on the cuda device to ensure it uses the appropriate allocation type (HOST/MANAGED/etc).
There was a problem hiding this comment.
I somewhat blindly copied this from an existing function:
arrow/cpp/src/arrow/gpu/cuda_memory.cc
Lines 488 to 497 in 1781b32
But yes, that just ignores the device type at the moment. Looking at the code, it seems that CudaDevice currently only supports CUDA allocation type?
Then for this PR I can just remove the other two?
There was a problem hiding this comment.
That seems reasonable for now. We can update and handle the other types in a future update
cpp/src/arrow/gpu/cuda_memory.cc
Outdated
|
|
||
| std::once_flag cuda_registered; | ||
|
|
||
| Status RegisterCUDADevice() { |
There was a problem hiding this comment.
Is there a reason for not doing this automatically at initialization rather than have the user call it explicitly?
There was a problem hiding this comment.
To be honest, I am not fully sure on what possible use cases would be. So from my point of view of enabling importing CUDA data in pyarrow, registering the CUDA device automatically is perfectly fine.
I assume it's quite unlikely that someone might want to register a different CUDA device from C++?
There was a problem hiding this comment.
Well, if they have a need to a dedicated CUDA mapper, they can just pass their own DeviceMemoryMapper when importing, AFAICT. What do you think @zeroshade ?
There was a problem hiding this comment.
That's true. I was going to suggest that with this registration mechanism, we don't necessarily need to keep the device mapper keyword, but that's actually a reason to keep it
There was a problem hiding this comment.
I pushed a commit that registers the CUDA device by default and therefore removes the public RegisterCudaDevice function.
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
|
This should be ready for another (final?) review |
pitrou
left a comment
There was a problem hiding this comment.
LGTM, but one minor suggestion still. Feel free to merge when done!
| std::shared_ptr<MemoryManager> default_cpu_memory_manager(); | ||
|
|
||
| using MemoryMapper = | ||
| std::function<Result<std::shared_ptr<MemoryManager>>(int64_t device_id)>; |
There was a problem hiding this comment.
Sorry, but a couple more suggestions to unify naming:
- rename
MemoryMappertoDeviceMemoryMapper? - rename
RegisterDeviceMemoryManagertoRegisterDeviceMemoryMapper - rename
GetDeviceMemoryManagertoGetDeviceMemoryMapper
There was a problem hiding this comment.
Good points, that naming is definitely more consistent.
There is however one problem that we already define a DeviceMemoryMapper for the keyword type in the actual bridge.h Import methods:
arrow/cpp/src/arrow/c/bridge.h
Lines 218 to 219 in 434f872
and we should probably find a distinct name, given that both are slight different (the one takes device_type+device_id and returns a MemoryManager, while the other is a function already tied to a specific device_type and thus only takes a device_id, returning again a MemoryManager)
It's of course a subtle difference that might be difficult to embody in a name. But at least using distinct names seems best.
There was a problem hiding this comment.
Perhaps DeviceIdMapper then? Not terribly pretty I admit...
There was a problem hiding this comment.
I think that sounds good for the function type alias, but then I would personally leave the register/get functions as is? I would find RegisterDeviceIdMapper a bit strange with the focus on the id, because you are also registering a device type, it's just that the value you store for the registered type is the DeviceIdMapper ..
Anyway, in the end it doesn't matter that much, happy to go with whatever we come up with.
There was a problem hiding this comment.
Or DeviceMapper / RegisterDeviceMapper / GetDeviceMapper ? (that's a bit more generic, but keeps the three consistent with each other)
|
@github-actions crossbow submit test-cuda-cpp |
|
Revision: 7a9e30d Submitted crossbow builds: ursacomputing/crossbow @ actions-b7970e2559
|
|
@github-actions crossbow submit test-cuda-cpp |
|
Revision: a5a6f6c Submitted crossbow builds: ursacomputing/crossbow @ actions-f7506cdb39
|
|
Thanks for the reviews! |
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit a407a6b. There were 10 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…o MemoryManager in C Device Data import (apache#40699) ### Rationale for this change Follow-up on apache#39980 (comment) Right now, the user of `ImportDeviceArray` or `ImportDeviceRecordBatch` needs to provide a `DeviceMemoryMapper` mapping the device type and id to a MemoryManager. We provide a default implementation of that mapper that just knows about the default CPU memory manager (and there is another implementation in `arrow::cuda`, but you need to explicitly pass that to the import function) To make this easier, this PR adds a registry such that default device mappers can be added separately. ### What changes are included in this PR? This PR adds two new public functions to register device types (`RegisterDeviceMemoryManager`) and retrieve the mapper from the registry (`GetDeviceMemoryManager`). Further, it provides a `RegisterCUDADevice` to optionally register the CUDA devices (by default only CPU device is registered). ### Are these changes tested? ### Are there any user-facing changes? * GitHub Issue: apache#40698 Lead-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Remove 4 deprecated C++ APIs: - decimal() function (v18.0.0, PR apache#43957) - FlightSql CancelQuery() server/client (v13.0.0, PR apache#36009) - cuda::DefaultMemoryMapper() (v16.0.0, PR apache#40699) All deprecated before 2025. Tests updated to use replacement APIs.
Remove 4 deprecated C++ APIs: - decimal() function (v18.0.0, PR apache#43957) - FlightSql CancelQuery() server/client (v13.0.0, PR apache#36009) - cuda::DefaultMemoryMapper() (v16.0.0, PR apache#40699) All deprecated before 2025. Tests updated to use replacement APIs.
Remove 2 deprecated C++ APIs: - decimal() function (v18.0.0, PR apache#43957) - cuda::DefaultMemoryMapper() (v16.0.0, PR apache#40699) All deprecated before 2025.
Remove 3 deprecated C++ APIs: - decimal() function (v18.0.0, PR apache#43957) - cuda::DefaultMemoryMapper() (v16.0.0, PR apache#40699) - HasValidityBitmap() global function (v17.0.0, PR apache#41115) All deprecated before 2025.
### Rationale for This Change This PR removes C++ APIs that have been deprecated for more than one year, in line with Arrow's deprecation and cleanup policy. All removed APIs were deprecated before January 1, 2025. --- ### What Changes Are Included in This PR? This PR removes **3 deprecated C++ APIs**: **1. `decimal(precision, scale)` function** - Deprecated in v18.0.0 (September 30, 2024, PR #43957) - Replacement: `smallest_decimal(precision, scale)` - Removed from: `cpp/src/arrow/type_fwd.h`, `cpp/src/arrow/type.cc` **2. `cuda::DefaultMemoryMapper(device_type, device_id)` function** - Deprecated in v16.0.0 (April 16, 2024, PR #40699) - Replacement: `arrow::DefaultDeviceMapper` - Removed from: `cpp/src/arrow/gpu/cuda_memory.h`, `cpp/src/arrow/gpu/cuda_memory.cc` **3. `HasValidityBitmap(Type::type id)` global function** - Deprecated in v17.0.0 (July 11, 2024, PR #41115) - Replacement: `may_have_validity_bitmap(Type::type id)` - Removed from: `cpp/src/arrow/type.h` - Note: Member functions like `ArrayData::HasValidityBitmap()` remain unchanged --- ### Are These Changes Tested? Yes. Verified via full codebase search that all removed symbols have zero remaining usages in the C++ codebase. --- ### Are There Any User-Facing Changes? Yes. Downstream C++ consumers using these deprecated APIs will need to migrate to the replacement APIs listed above. --- * GitHub Issue: #49356 Authored-by: unknown <alimahmoodrana82@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Rationale for this change
Follow-up on #39980 (comment)
Right now, the user of
ImportDeviceArrayorImportDeviceRecordBatchneeds to provide aDeviceMemoryMappermapping the device type and id to a MemoryManager. We provide a default implementation of that mapper that just knows about the default CPU memory manager (and there is another implementation inarrow::cuda, but you need to explicitly pass that to the import function)To make this easier, this PR adds a registry such that default device mappers can be added separately.
What changes are included in this PR?
This PR adds two new public functions to register device types (
RegisterDeviceMemoryManager) and retrieve the mapper from the registry (GetDeviceMemoryManager).Further, it provides a
RegisterCUDADeviceto optionally register the CUDA devices (by default only CPU device is registered).Are these changes tested?
Are there any user-facing changes?