Add API reference for iris.ccl, iris.ops, and iris.x modules by Copilot · Pull Request #345 · ROCm/iris

Copilot · 2026-02-03T02:36:55Z

Add API reference for iris.ccl, iris.ops and iris.x

Original prompt

This section details on the original issue you should resolve

<issue_title>[Documentation]: Add API reference for iris.ccl, iris.ops and iris.x</issue_title>
<issue_description>### Description of errors

https://github.com/ROCm/iris/tree/main/docs/reference

https://github.com/ROCm/iris/blob/main/docs/reference/api-reference.md

Attach any links, screenshots, or additional evidence you think will be helpful.

No response</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes [Documentation]: Add API reference for iris.ccl, iris.ops and iris.x #344

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>

Copilot

Pull request overview

This pull request adds comprehensive API reference documentation for three core Iris modules (iris.ccl, iris.ops, and iris.x) that were previously undocumented. The documentation follows the established pattern from existing Triton and Gluon reference sections.

Changes:

Added structured documentation for collective communication operations (CCL), fused GEMM+CCL operations (ops), and device-side tile-level primitives (x)
Created overview pages with usage patterns and examples for each module
Added autodoc-based API reference pages for operations, configuration classes, and core abstractions
Updated the main API reference navigation to include links to the new sections

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`docs/reference/api-reference.md`	Added navigation entries for the three new modules in the main API reference index
`docs/reference/ccl/overview.md`	Overview of collective communication library with usage patterns and examples
`docs/reference/ccl/operations.md`	Autodoc references for CCL collective operations (all_reduce, all_gather, all_to_all, reduce_scatter)
`docs/reference/ccl/config.md`	Autodoc references for CCL configuration classes (Config, ReduceOp)
`docs/reference/ops/overview.md`	Overview of fused GEMM+CCL operations with usage patterns and workspace management examples
`docs/reference/ops/operations.md`	Autodoc references for fused operations (matmul_all_reduce, all_gather_matmul, matmul_all_gather, matmul_reduce_scatter) and OpsNamespace
`docs/reference/ops/config.md`	Autodoc references for ops configuration classes (FusedConfig, FusedWorkspace)
`docs/reference/x/overview.md`	Overview of device-side tile-level primitives with algorithm selection and usage patterns
`docs/reference/x/core.md`	Autodoc references for core abstractions (TileView, TensorView, DeviceContext, AllReduceConfig) and helper functions
`docs/reference/x/operations.md`	Autodoc references for device-side collective operations (all-reduce variants, all_gather, all_to_all, reduce_scatter, gather)

docs/reference/api-reference.md

docs/reference/x/overview.md

docs/reference/ops/overview.md

docs/reference/x/overview.md

mawad-amd · 2026-02-13T05:03:57Z

@copilot apply changes based on the comments in this thread

- Fix iris.x examples to use correct API (iris.DeviceContext, make_tensor_view) - Update AllReduceConfig examples to use integer codes instead of strings - Add tl.constexpr annotations to kernel parameters - Fix heap_size undefined variable in ops examples - Update _toc.yml to include ccl, ops, and x documentation sections - Remove incorrect DeviceContext reference from iris.x core.md - Clarify that operations are standalone functions, not methods Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

The _toc.yml changes from commit 88db72f were lost during the merge with main. This commit restores the TOC entries for iris.ccl, iris.ops, and iris.x documentation. Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

docs/reference/ccl/overview.md

Changed variable naming from 'shmem' to 'ctx' in iris.ccl overview documentation to be more consistent with the rest of the codebase. Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

docs/reference/x/overview.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Copilot · 2026-02-13T07:46:50Z

docs/reference/x/core.md

+## Tile (deprecated)
+
+Legacy tile representation. Use TileView instead.
+


The section header labels iris.x.Tile as “deprecated”, but the implementation doesn’t mark it deprecated and several iris.x collectives (e.g., all_reduce_atomic, all_gather, reduce_scatter) require a Tile because they read tile.data. Please remove or reword the “deprecated” claim, and clarify when to use TileView vs Tile (TileView for coordinates/shape-only, Tile when carrying computed tile data).

Copilot · 2026-02-13T07:46:51Z

docs/reference/x/overview.md

+    tile = iris.x.TileView(pid_m, pid_n, BLOCK_M, BLOCK_N)
+    src_view = iris.x.make_tensor_view(input_ptr, M, N, stride_m, stride_n)
+    dst_view = iris.x.make_tensor_view(output_ptr, M, N, stride_m, stride_n)
+    ctx = iris.DeviceContext.initialize(context_tensor, rank, world_size)


In this example, tile is a TileView, but iris.x.all_reduce_atomic expects a Tile with a populated tile.data (the local tile’s computed values). As written, the example can’t work because TileView has no data field. Update the snippet to compute a tile result (e.g., load/compute) and construct iris.x.Tile(pid_m, pid_n, BLOCK_M, BLOCK_N, data) before calling all_reduce_atomic (or use an op that accepts TileView).

Suggested change

tile = iris.x.TileView(pid_m, pid_n, BLOCK_M, BLOCK_N)

src_view = iris.x.make_tensor_view(input_ptr, M, N, stride_m, stride_n)

dst_view = iris.x.make_tensor_view(output_ptr, M, N, stride_m, stride_n)

ctx = iris.DeviceContext.initialize(context_tensor, rank, world_size)

src_view = iris.x.make_tensor_view(input_ptr, M, N, stride_m, stride_n)

dst_view = iris.x.make_tensor_view(output_ptr, M, N, stride_m, stride_n)

ctx = iris.DeviceContext.initialize(context_tensor, rank, world_size)

# Compute the local tile's data (e.g., load from input_ptr)

offs_m = pid_m * BLOCK_M + tl.arange(0, BLOCK_M)[:, None]

offs_n = pid_n * BLOCK_N + tl.arange(0, BLOCK_N)[None, :]

ptrs = input_ptr + offs_m * stride_m + offs_n * stride_n

data = tl.load(ptrs)

# Construct a Tile with populated data for the collective

tile = iris.x.Tile(pid_m, pid_n, BLOCK_M, BLOCK_N, data)

Copilot · 2026-02-13T07:46:51Z

docs/reference/x/overview.md

+           BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr):
+    pid_m = tl.program_id(0)
+    pid_n = tl.program_id(1)
+
+    tile = iris.x.TileView(pid_m, pid_n, BLOCK_M, BLOCK_N)
+    dst_view = iris.x.make_tensor_view(output_ptr, M, N, stride_m, stride_n)
+    ctx = iris.DeviceContext.initialize(context_tensor, rank, world_size)
+
+    # Call collectives directly
+    iris.x.all_reduce_atomic(tile, dst_view, ctx)
+    iris.x.all_gather(tile, dst_view, dim=0, ctx=ctx)
+    iris.x.all_to_all(tile, dst_view, dst_view, N_per_rank, ctx)


This usage-pattern snippet calls iris.x.all_to_all(tile, dst_view, dst_view, N_per_rank, ctx) but N_per_rank is not defined/passed in the kernel signature, and all_to_all requires a src_view argument (signature: all_to_all(tile, src_view, dst_view, N_per_rank, ctx)). The example should define N_per_rank: tl.constexpr in the kernel params and create src_view (likely from an input pointer) rather than passing dst_view twice.

Suggested change

BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr):

pid_m = tl.program_id(0)

pid_n = tl.program_id(1)

tile = iris.x.TileView(pid_m, pid_n, BLOCK_M, BLOCK_N)

dst_view = iris.x.make_tensor_view(output_ptr, M, N, stride_m, stride_n)

ctx = iris.DeviceContext.initialize(context_tensor, rank, world_size)

# Call collectives directly

iris.x.all_reduce_atomic(tile, dst_view, ctx)

iris.x.all_gather(tile, dst_view, dim=0, ctx=ctx)

iris.x.all_to_all(tile, dst_view, dst_view, N_per_rank, ctx)

N_per_rank: tl.constexpr,

BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr):

pid_m = tl.program_id(0)

pid_n = tl.program_id(1)

tile = iris.x.TileView(pid_m, pid_n, BLOCK_M, BLOCK_N)

src_view = iris.x.make_tensor_view(input_ptr, M, N, stride_m, stride_n)

dst_view = iris.x.make_tensor_view(output_ptr, M, N, stride_m, stride_n)

ctx = iris.DeviceContext.initialize(context_tensor, rank, world_size)

# Call collectives directly

iris.x.all_reduce_atomic(tile, dst_view, ctx)

iris.x.all_gather(tile, dst_view, dim=0, ctx=ctx)

iris.x.all_to_all(tile, src_view, dst_view, N_per_rank, ctx)

Copilot · 2026-02-13T07:46:51Z

docs/reference/x/overview.md

+    dummy_locks = tl.zeros((1,), dtype=tl.int32)
+    config = iris.x.AllReduceConfig(1, dummy_locks)
+    iris.x.all_reduce_ring(tile, dst_view, ctx)
+


In this algorithm-selection example, config = iris.x.AllReduceConfig(1, dummy_locks) is created but never used, and the subsequent call iris.x.all_reduce_ring(tile, dst_view, ctx) doesn’t match the actual all_reduce_ring signature (it requires src_view and dst_view). Either remove AllReduceConfig from this example and show the correct call form for all_reduce_ring(tile, src_view, dst_view, ctx), or add a real API that consumes AllReduceConfig and document that instead.

Copilot · 2026-02-13T07:46:51Z

docs/reference/x/overview.md

+    config = iris.x.AllReduceConfig(4, locks_ptr)
+    tile_id = pid_m * num_tiles_n + pid_n
+    iris.x.all_reduce_spinlock(tile, dst_view, locks_ptr, ctx)
+```


This snippet has multiple undefined/mismatched pieces: tile_id = pid_m * num_tiles_n + pid_n references num_tiles_n which isn’t defined in the example, and iris.x.all_reduce_spinlock takes the locks tensor directly (no AllReduceConfig parameter), so the config = iris.x.AllReduceConfig(4, locks_ptr) line is misleading unless there is an API that consumes it. Consider removing AllReduceConfig here, define num_tiles_n (e.g., via tl.cdiv(N, BLOCK_N) or dst_view helpers), and show the correct all_reduce_spinlock(tile, dst_view, locks_ptr, ctx) call with the required locks layout.

Initial plan

2467cda

Copilot AI assigned Copilot and neoblizz Feb 3, 2026

Copilot started work on behalf of neoblizz February 3, 2026 02:37 View session

Copilot AI and others added 3 commits February 3, 2026 02:39

Add API reference documentation for iris.ccl, iris.ops and iris.x

69cedd0

Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>

Update _toc.yml to include ccl, ops, and x documentation sections

88db72f

Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>

Add API reference for iris.ccl, iris.ops, and iris.x modules

1dd0a88

Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add API reference for iris.ccl, iris.ops, and iris.x~~ Add API reference for iris.ccl, iris.ops, and iris.x modules Feb 3, 2026

Copilot AI requested a review from neoblizz February 3, 2026 02:44

Copilot finished work on behalf of neoblizz February 3, 2026 02:44

neoblizz marked this pull request as ready for review February 3, 2026 02:54

neoblizz requested a review from mawad-amd as a code owner February 3, 2026 02:54

Copilot AI review requested due to automatic review settings February 3, 2026 02:54

neoblizz requested a review from BKP as a code owner February 3, 2026 02:54

neoblizz approved these changes Feb 3, 2026

View reviewed changes

Copilot started reviewing on behalf of neoblizz February 3, 2026 02:54 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

Merge branch 'main' into copilot/add-api-reference-docs

7a78b41

Copilot started work on behalf of mawad-amd February 13, 2026 05:04 View session

Copilot AI and others added 3 commits February 13, 2026 05:07

Re-add _toc.yml changes lost in merge

7f055d8

The _toc.yml changes from commit 88db72f were lost during the merge with main. This commit restores the TOC entries for iris.ccl, iris.ops, and iris.x documentation. Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

714883d

Copilot finished work on behalf of mawad-amd February 13, 2026 05:10

mawad-amd reviewed Feb 13, 2026

View reviewed changes

docs/reference/ccl/overview.md Outdated Show resolved Hide resolved

Copilot started work on behalf of mawad-amd February 13, 2026 06:26 View session

mawad-amd self-requested a review February 13, 2026 06:27

Use 'ctx' instead of 'shmem' in ccl documentation

c0f211f

Changed variable naming from 'shmem' to 'ctx' in iris.ccl overview documentation to be more consistent with the rest of the codebase. Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>

mawad-amd requested a review from Copilot February 13, 2026 06:27

Copilot finished work on behalf of mawad-amd February 13, 2026 06:27

Copilot started reviewing on behalf of mawad-amd February 13, 2026 06:28 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

docs/reference/x/overview.md Outdated Show resolved Hide resolved

Update docs/reference/x/overview.md

74c8bb1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

mawad-amd requested a review from Copilot February 13, 2026 07:39

Copilot started reviewing on behalf of mawad-amd February 13, 2026 07:40 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

		## Tile (deprecated)

		Legacy tile representation. Use TileView instead.

Conversation

Copilot AI commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!