-
Notifications
You must be signed in to change notification settings - Fork 91
[Core] Adding hostMalloc API #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #376 +/- ##
==========================================
- Coverage 75.29% 75.15% -0.14%
==========================================
Files 254 254
Lines 19438 19416 -22
==========================================
- Hits 14635 14593 -42
- Misses 4803 4823 +20
|
|
Thanks for adding this! I'll take a look at it this weekend, it's a busy week sorry 😞 |
02fb349 to
4719238
Compare
…riggered via a prop in device::malloc
4719238 to
014833a
Compare
dmed256
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you so much for making these changes! I made a few styling review suggestions so I could batch commit them before merging.
Description
This PR implements a new API,hostMallocwhich lets users mallococca::memoryregions in pinned host memory, if the backend device supports this, i.e. in CUDA, HIP, and somewhat OpenCL.Edit: After consideration, we don't need to introduce a separate device::hostMalloc API for a functionality that is only present on some backends. We'll instead trigger the hostMalloc behavior by passing a
"{'host': true}"prop in the device::malloc.This PR also somewhat changes how the device/pinned host pointers are managed in backends in order to let users obtain the correct underlying pointer via the
memory::ptr()method automatically, rather than having to pass a particularocca::propertiessetting. Users can also directly pass these hostMalloc'd buffers into kernels since GPU devices can directly read & write pinned host memory regions*. Finally, the correct copyTo/copyFrom behaviors are determined automatically based on the location of the underlying pointers.Some examples of the resulting API change in user applications can be seen in this draft PR: paranumal/libparanumal#28
*The OpenCL spec unfortunately does not explicitly guarantee that GPU kernels will directly access pinned host memory. For now, users will continue to have to explicitly copy back the contents of hostMalloc'd buffers in order to guarantee coherency. Potentially, OCCA could also expose some kind of sync wherein a pinned host buffer would be re-mapped.
P.S. I have no idea if there is a similar functionality in Metal. If someone wants to take a look at the corresponding tweaks in the Metal backend, that would be greatly appreciated.