Fix DMA-BUF Export/Import with PyTorch Caching Allocator Offsets by mawad-amd · Pull Request #348 · ROCm/iris

mawad-amd · 2026-02-03T22:49:11Z

Motivation

DMA-BUF export/import was failing when tensors were suballocated from PyTorch's caching allocator. hipMemGetHandleForAddressRange exports a handle to the entire allocation buffer, not just the specific tensor's memory. Without offset correction, imports would map to the wrong location, causing data corruption.

This became apparent after importing iris.ops (which loads tritonBLAS), causing subsequent tensors to be suballocated with non-zero offsets.

Technical Details

Changes to iris/hip.py:

export_dmabuf_handle() - Now returns (fd, base_ptr, base_size) tuple. Uses hipMemGetAddressRange() to query allocation base and size.
import_dmabuf_handle() - Now accepts original_ptr and base_ptr parameters. Calculates offset and returns mapped_base + offset.

Changes to iris/allocators/torch_allocator.py:

Updated peer memory exchange to transmit metadata (base_ptr, base_size, heap_ptr) along with FD using struct.pack('QQQ', ...).

Changes to tests/unittests/test_dmabuf_apis.py:

Updated all tests to use new API.
Added test_dmabuf_with_offset() to explicitly validate offset handling.

Breaking Change: API changed from returning single fd to tuple (fd, base_ptr, base_size).

Test Plan

All 6 DMA-BUF unit tests pass (including new offset test)
Multi-rank tests verified with 1 and 2 ranks
test_dmabuf_with_offset() explicitly tests the fix by allocating two tensors (forcing offset > 0) and verifying correct data import

Test Result

✅ All 6 DMA-BUF tests pass:

test_dmabuf_export                    PASSED [ 16%]
test_dmabuf_import                    PASSED [ 33%]
test_dmabuf_export_import_roundtrip   PASSED [ 50%]
test_iris_symmetric_heap_creation     PASSED [ 66%]
test_dmabuf_with_offset               PASSED [ 83%]
test_dmabuf_multirank_exchange        PASSED [100%]

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This pull request fixes a critical bug in DMA-BUF export/import functionality when working with PyTorch's caching allocator. The issue arose because hipMemGetHandleForAddressRange exports handles to entire allocation buffers rather than specific sub-allocations, causing data corruption when tensors had non-zero offsets within those buffers.

Changes:

Modified export_dmabuf_handle() to return a tuple (fd, base_ptr, base_size) instead of just fd, using hipMemGetAddressRange to query the base allocation information
Updated import_dmabuf_handle() to accept offset parameters and correctly map to the intended memory location
Updated TorchAllocator to pack/unpack the additional metadata when exchanging handles between ranks
Added comprehensive tests including a dedicated offset test to validate the fix

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
iris/hip.py	Added `hipMemGetAddressRange` call to query base allocation; modified return type to tuple; added offset calculation in import function
iris/allocators/torch_allocator.py	Updated handle exchange to pack/unpack metadata (base_ptr, base_size, heap_ptr); updated import call to use offset parameters
tests/unittests/test_dmabuf_apis.py	Updated all existing tests for new API; added `test_dmabuf_with_offset()` to explicitly test offset handling

tests/unittests/test_dmabuf_apis.py

iris/allocators/torch_allocator.py

iris/hip.py

tests/unittests/test_dmabuf_apis.py

Increase bfloat16 tolerance from 1.5 to 2.5 to handle worst-case numerical precision with large matrices (1024x256x512) and 8-rank all_reduce operations. CI was seeing max difference of 2.0. Co-authored-by: Cursor <cursoragent@cursor.com>

- run_core_tests.sh: Quick validation (examples + unittests, 1-8 ranks) - run_all_tests.sh: Full CI-style testing (all 5 dirs, 1-8 ranks) Both scripts: - Create timestamped log subdirectories (logs/TIMESTAMP/) - Generate _all.log mega log capturing everything - Create individual test logs for debugging specific failures Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Fix DMA buf import/export

0b156eb

github-actions bot added in-progress We are working on it iris Iris project issue labels Feb 3, 2026

Fix ruff errors

49eaaaa

mawad-amd marked this pull request as ready for review February 3, 2026 22:50

mawad-amd requested review from BKP and neoblizz as code owners February 3, 2026 22:50

Copilot AI review requested due to automatic review settings February 3, 2026 22:50

Apply Ruff auto-fixes

21c732e

Copilot started reviewing on behalf of mawad-amd February 3, 2026 22:51 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

mawad-amd and others added 3 commits February 3, 2026 23:23

Address compilot comments

049d33a

mawad-amd merged commit 137770a into main Feb 4, 2026
72 checks passed

mawad-amd deleted the muhaawad/fix-dma-buf branch February 4, 2026 04:49

mawad-amd mentioned this pull request Feb 4, 2026

Review all existing tests #353

Open

Copilot AI mentioned this pull request Feb 4, 2026

Test suite analysis: 530K tests, 62.9% reduction plan (final - proper multi-rank validation) #354

Draft

Copilot AI added a commit that referenced this pull request Feb 4, 2026

Replace estimates with actual CI timing data from PR #348

beb8882

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DMA-BUF Export/Import with PyTorch Caching Allocator Offsets#348

Fix DMA-BUF Export/Import with PyTorch Caching Allocator Offsets#348
mawad-amd merged 6 commits intomainfrom
muhaawad/fix-dma-buf

mawad-amd commented Feb 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mawad-amd commented Feb 3, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant