Skip to content

Comments

GH-40698: [C++] Create registry for Devices to map DeviceType to MemoryManager in C Device Data import#40699

Merged
jorisvandenbossche merged 15 commits intoapache:mainfrom
jorisvandenbossche:device-registry
Mar 27, 2024
Merged

GH-40698: [C++] Create registry for Devices to map DeviceType to MemoryManager in C Device Data import#40699
jorisvandenbossche merged 15 commits intoapache:mainfrom
jorisvandenbossche:device-registry

Conversation

@jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Mar 21, 2024

Rationale for this change

Follow-up on #39980 (comment)

Right now, the user of ImportDeviceArray or ImportDeviceRecordBatch needs to provide a DeviceMemoryMapper mapping the device type and id to a MemoryManager. We provide a default implementation of that mapper that just knows about the default CPU memory manager (and there is another implementation in arrow::cuda, but you need to explicitly pass that to the import function)

To make this easier, this PR adds a registry such that default device mappers can be added separately.

What changes are included in this PR?

This PR adds two new public functions to register device types (RegisterDeviceMemoryManager) and retrieve the mapper from the registry (GetDeviceMemoryManager).

Further, it provides a RegisterCUDADevice to optionally register the CUDA devices (by default only CPU device is registered).

Are these changes tested?

Are there any user-facing changes?

@github-actions
Copy link

⚠️ GitHub issue #40698 has been automatically assigned in GitHub to PR creator.

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Mar 21, 2024

@pitrou I would appreciate a preliminary review to check if this is going in the right direction

(of course still need to add tests, docs, clean-up naming, etc, and testing it now with CUDA)

For now I didn't go for a dual public / Impl class structure like we do for other registries, because it seems that the class itself doesn't need to be public in this case. Just the register / get function should be sufficient for users?

And I added it to device.h/cc right now, but actually if this will only be used for the C Device interface, could also move it to bridge.h/cc

@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

@github-actions
Copy link

Revision: e16e24d

Submitted crossbow builds: ursacomputing/crossbow @ actions-3c9b11581b

Task Status
test-cuda-python GitHub Actions

@pitrou
Copy link
Member

pitrou commented Mar 21, 2024

@pitrou I would appreciate a preliminary review to check if this is going in the right direction

Yes, this is looking ok, though the implementation can be simplified a bit.

For now I didn't go for a dual public / Impl class structure like we do for other registries, because it seems that the class itself doesn't need to be public in this case. Just the register / get function should be sufficient for users?

Agreed. No need to expose the registry class itself for now.

And I added it to device.h/cc right now, but actually if this will only be used for the C Device interface, could also move it to bridge.h/cc

Since the API is minimal and doesn't require any addition includes, we can keep it in device.h IMHO.

@jorisvandenbossche
Copy link
Member Author

though the implementation can be simplified a bit

You mean to not even use the internal class to store the mapping, but just have the register/get functions and store the unordered_map in a global variable?

@pitrou
Copy link
Member

pitrou commented Mar 21, 2024

No, the class is ok, but the call_once is not required if instead using a static local variable.

@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review March 22, 2024 08:51
@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

@github-actions
Copy link

Revision: f33872d

Submitted crossbow builds: ursacomputing/crossbow @ actions-2f2ec1b2f6

Task Status
test-cuda-python GitHub Actions

Comment on lines 505 to 508
Result<std::shared_ptr<MemoryManager>> DefaultGPUMemoryMapper(int64_t device_id) {
ARROW_ASSIGN_OR_RAISE(auto device, arrow::cuda::CudaDevice::Make(device_id));
return device->default_memory_manager();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably include some sort of customizations on the cuda device to ensure it uses the appropriate allocation type (HOST/MANAGED/etc).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat blindly copied this from an existing function:

Result<std::shared_ptr<MemoryManager>> DefaultMemoryMapper(ArrowDeviceType device_type,
int64_t device_id) {
switch (device_type) {
case ARROW_DEVICE_CPU:
return default_cpu_memory_manager();
case ARROW_DEVICE_CUDA:
case ARROW_DEVICE_CUDA_HOST:
case ARROW_DEVICE_CUDA_MANAGED: {
ARROW_ASSIGN_OR_RAISE(auto device, arrow::cuda::CudaDevice::Make(device_id));
return device->default_memory_manager();

But yes, that just ignores the device type at the moment. Looking at the code, it seems that CudaDevice currently only supports CUDA allocation type?

Then for this PR I can just remove the other two?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable for now. We can update and handle the other types in a future update

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 25, 2024

std::once_flag cuda_registered;

Status RegisterCUDADevice() {
Copy link
Member

@pitrou pitrou Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for not doing this automatically at initialization rather than have the user call it explicitly?

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I am not fully sure on what possible use cases would be. So from my point of view of enabling importing CUDA data in pyarrow, registering the CUDA device automatically is perfectly fine.
I assume it's quite unlikely that someone might want to register a different CUDA device from C++?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if they have a need to a dedicated CUDA mapper, they can just pass their own DeviceMemoryMapper when importing, AFAICT. What do you think @zeroshade ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I was going to suggest that with this registration mechanism, we don't necessarily need to keep the device mapper keyword, but that's actually a reason to keep it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit that registers the CUDA device by default and therefore removes the public RegisterCudaDevice function.

Co-authored-by: Antoine Pitrou <pitrou@free.fr>
@github-actions github-actions bot added the awaiting change review Awaiting change review label Mar 26, 2024
@jorisvandenbossche
Copy link
Member Author

This should be ready for another (final?) review

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but one minor suggestion still. Feel free to merge when done!

std::shared_ptr<MemoryManager> default_cpu_memory_manager();

using MemoryMapper =
std::function<Result<std::shared_ptr<MemoryManager>>(int64_t device_id)>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but a couple more suggestions to unify naming:

  1. rename MemoryMapper to DeviceMemoryMapper?
  2. rename RegisterDeviceMemoryManager to RegisterDeviceMemoryMapper
  3. rename GetDeviceMemoryManager to GetDeviceMemoryMapper

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, that naming is definitely more consistent.

There is however one problem that we already define a DeviceMemoryMapper for the keyword type in the actual bridge.h Import methods:

using DeviceMemoryMapper =
std::function<Result<std::shared_ptr<MemoryManager>>(ArrowDeviceType, int64_t)>;

and we should probably find a distinct name, given that both are slight different (the one takes device_type+device_id and returns a MemoryManager, while the other is a function already tied to a specific device_type and thus only takes a device_id, returning again a MemoryManager)

It's of course a subtle difference that might be difficult to embody in a name. But at least using distinct names seems best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps DeviceIdMapper then? Not terribly pretty I admit...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sounds good for the function type alias, but then I would personally leave the register/get functions as is? I would find RegisterDeviceIdMapper a bit strange with the focus on the id, because you are also registering a device type, it's just that the value you store for the registered type is the DeviceIdMapper ..

Anyway, in the end it doesn't matter that much, happy to go with whatever we come up with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or DeviceMapper / RegisterDeviceMapper / GetDeviceMapper ? (that's a bit more generic, but keeps the three consistent with each other)

@pitrou
Copy link
Member

pitrou commented Mar 26, 2024

@github-actions crossbow submit test-cuda-cpp

@github-actions
Copy link

Revision: 7a9e30d

Submitted crossbow builds: ursacomputing/crossbow @ actions-b7970e2559

Task Status
test-cuda-cpp GitHub Actions

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 26, 2024
@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit test-cuda-cpp

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 27, 2024
@github-actions
Copy link

Revision: a5a6f6c

Submitted crossbow builds: ursacomputing/crossbow @ actions-f7506cdb39

Task Status
test-cuda-cpp GitHub Actions

@jorisvandenbossche
Copy link
Member Author

Thanks for the reviews!

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit a407a6b.

There were 10 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…o MemoryManager in C Device Data import (apache#40699)

### Rationale for this change

Follow-up on apache#39980 (comment)

Right now, the user of `ImportDeviceArray` or `ImportDeviceRecordBatch` needs to provide a `DeviceMemoryMapper` mapping the device type and id to a MemoryManager. We provide a default implementation of that mapper that just knows about the default CPU memory manager (and there is another implementation in `arrow::cuda`, but you need to explicitly pass that to the import function)

To make this easier, this PR adds a registry such that default device mappers can be added separately.

### What changes are included in this PR?

This PR adds two new public functions to register device types (`RegisterDeviceMemoryManager`) and retrieve the mapper from the registry (`GetDeviceMemoryManager`).

Further, it provides a `RegisterCUDADevice` to optionally register the CUDA devices (by default only CPU device is registered).

### Are these changes tested?

### Are there any user-facing changes?

* GitHub Issue: apache#40698

Lead-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
AliRana30 added a commit to AliRana30/arrow that referenced this pull request Feb 18, 2026
Remove 4 deprecated C++ APIs:
- decimal() function (v18.0.0, PR apache#43957)
- FlightSql CancelQuery() server/client (v13.0.0, PR apache#36009)
- cuda::DefaultMemoryMapper() (v16.0.0, PR apache#40699)

All deprecated before 2025. Tests updated to use replacement APIs.
AliRana30 added a commit to AliRana30/arrow that referenced this pull request Feb 18, 2026
Remove 4 deprecated C++ APIs:
- decimal() function (v18.0.0, PR apache#43957)
- FlightSql CancelQuery() server/client (v13.0.0, PR apache#36009)
- cuda::DefaultMemoryMapper() (v16.0.0, PR apache#40699)

All deprecated before 2025. Tests updated to use replacement APIs.
AliRana30 added a commit to AliRana30/arrow that referenced this pull request Feb 20, 2026
Remove 2 deprecated C++ APIs:
- decimal() function (v18.0.0, PR apache#43957)
- cuda::DefaultMemoryMapper() (v16.0.0, PR apache#40699)

All deprecated before 2025.
AliRana30 added a commit to AliRana30/arrow that referenced this pull request Feb 21, 2026
Remove 3 deprecated C++ APIs:
- decimal() function (v18.0.0, PR apache#43957)
- cuda::DefaultMemoryMapper() (v16.0.0, PR apache#40699)
- HasValidityBitmap() global function (v17.0.0, PR apache#41115)

All deprecated before 2025.
kou pushed a commit that referenced this pull request Feb 21, 2026
### Rationale for This Change

This PR removes C++ APIs that have been deprecated for more than one year, in line with Arrow's deprecation and cleanup policy. All removed APIs were deprecated before January 1, 2025.

---

### What Changes Are Included in This PR?

This PR removes **3 deprecated C++ APIs**:

**1. `decimal(precision, scale)` function**
- Deprecated in v18.0.0 (September 30, 2024, PR #43957)
- Replacement: `smallest_decimal(precision, scale)`
- Removed from: `cpp/src/arrow/type_fwd.h`, `cpp/src/arrow/type.cc`

**2. `cuda::DefaultMemoryMapper(device_type, device_id)` function**
- Deprecated in v16.0.0 (April 16, 2024, PR #40699)
- Replacement: `arrow::DefaultDeviceMapper`
- Removed from: `cpp/src/arrow/gpu/cuda_memory.h`, `cpp/src/arrow/gpu/cuda_memory.cc`

**3. `HasValidityBitmap(Type::type id)` global function**
- Deprecated in v17.0.0 (July 11, 2024, PR #41115)
- Replacement: `may_have_validity_bitmap(Type::type id)`
- Removed from: `cpp/src/arrow/type.h`
- Note: Member functions like `ArrayData::HasValidityBitmap()` remain unchanged

---

### Are These Changes Tested?

Yes. Verified via full codebase search that all removed symbols have zero remaining usages in the C++ codebase.

---

### Are There Any User-Facing Changes?

Yes. Downstream C++ consumers using these deprecated APIs will need to migrate to the replacement APIs listed above.

---

* GitHub Issue: #49356

Authored-by: unknown <alimahmoodrana82@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants