Skip to content

Add API reference for iris.ccl, iris.ops, and iris.x modules#345

Open
Copilot wants to merge 10 commits intomainfrom
copilot/add-api-reference-docs
Open

Add API reference for iris.ccl, iris.ops, and iris.x modules#345
Copilot wants to merge 10 commits intomainfrom
copilot/add-api-reference-docs

Conversation

Copy link
Contributor

Copilot AI commented Feb 3, 2026

Add API reference for iris.ccl, iris.ops and iris.x

  • Explore the repository structure and existing documentation pattern
  • Create API reference documentation for iris.ccl
    • Create overview.md for iris.ccl
    • Create operations.md for iris.ccl operations (all_reduce, all_gather, all_to_all, reduce_scatter)
    • Create config.md for iris.ccl configuration
  • Create API reference documentation for iris.ops
    • Create overview.md for iris.ops
    • Create operations.md for iris.ops operations (matmul_all_reduce, all_gather_matmul, matmul_all_gather, matmul_reduce_scatter)
    • Create config.md for iris.ops configuration
  • Create API reference documentation for iris.x
    • Create overview.md for iris.x
    • Create core.md for iris.x core abstractions (TileView, TensorView, DeviceContext, etc.)
    • Create operations.md for iris.x device-side operations
  • Update docs/reference/api-reference.md to include links to new documentation
  • Update docs/sphinx/_toc.yml to include new documentation in table of contents
  • Test documentation builds successfully
  • Address review comments
    • Fix iris.x examples to use correct API
    • Update AllReduceConfig to use integer codes
    • Add missing tl.constexpr annotations
    • Fix undefined heap_size variable
    • Update _toc.yml (restored after merge)
    • Use 'ctx' instead of 'shmem' in ccl documentation
Original prompt

This section details on the original issue you should resolve

<issue_title>[Documentation]: Add API reference for iris.ccl, iris.ops and iris.x</issue_title>
<issue_description>### Description of errors

Attach any links, screenshots, or additional evidence you think will be helpful.

No response</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 3 commits February 3, 2026 02:39
Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>
Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>
Co-authored-by: neoblizz <9790745+neoblizz@users.noreply.github.com>
Copilot AI changed the title [WIP] Add API reference for iris.ccl, iris.ops, and iris.x Add API reference for iris.ccl, iris.ops, and iris.x modules Feb 3, 2026
Copilot AI requested a review from neoblizz February 3, 2026 02:44
@neoblizz neoblizz marked this pull request as ready for review February 3, 2026 02:54
@neoblizz neoblizz requested a review from mawad-amd as a code owner February 3, 2026 02:54
Copilot AI review requested due to automatic review settings February 3, 2026 02:54
@neoblizz neoblizz requested a review from BKP as a code owner February 3, 2026 02:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive API reference documentation for three core Iris modules (iris.ccl, iris.ops, and iris.x) that were previously undocumented. The documentation follows the established pattern from existing Triton and Gluon reference sections.

Changes:

  • Added structured documentation for collective communication operations (CCL), fused GEMM+CCL operations (ops), and device-side tile-level primitives (x)
  • Created overview pages with usage patterns and examples for each module
  • Added autodoc-based API reference pages for operations, configuration classes, and core abstractions
  • Updated the main API reference navigation to include links to the new sections

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
docs/reference/api-reference.md Added navigation entries for the three new modules in the main API reference index
docs/reference/ccl/overview.md Overview of collective communication library with usage patterns and examples
docs/reference/ccl/operations.md Autodoc references for CCL collective operations (all_reduce, all_gather, all_to_all, reduce_scatter)
docs/reference/ccl/config.md Autodoc references for CCL configuration classes (Config, ReduceOp)
docs/reference/ops/overview.md Overview of fused GEMM+CCL operations with usage patterns and workspace management examples
docs/reference/ops/operations.md Autodoc references for fused operations (matmul_all_reduce, all_gather_matmul, matmul_all_gather, matmul_reduce_scatter) and OpsNamespace
docs/reference/ops/config.md Autodoc references for ops configuration classes (FusedConfig, FusedWorkspace)
docs/reference/x/overview.md Overview of device-side tile-level primitives with algorithm selection and usage patterns
docs/reference/x/core.md Autodoc references for core abstractions (TileView, TensorView, DeviceContext, AllReduceConfig) and helper functions
docs/reference/x/operations.md Autodoc references for device-side collective operations (all-reduce variants, all_gather, all_to_all, reduce_scatter, gather)

@mawad-amd
Copy link
Collaborator

@copilot apply changes based on the comments in this thread

Copilot AI and others added 3 commits February 13, 2026 05:07
- Fix iris.x examples to use correct API (iris.DeviceContext, make_tensor_view)
- Update AllReduceConfig examples to use integer codes instead of strings
- Add tl.constexpr annotations to kernel parameters
- Fix heap_size undefined variable in ops examples
- Update _toc.yml to include ccl, ops, and x documentation sections
- Remove incorrect DeviceContext reference from iris.x core.md
- Clarify that operations are standalone functions, not methods

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
The _toc.yml changes from commit 88db72f were lost during the merge with main.
This commit restores the TOC entries for iris.ccl, iris.ops, and iris.x documentation.

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Changed variable naming from 'shmem' to 'ctx' in iris.ccl overview documentation
to be more consistent with the rest of the codebase.

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Comment on lines +17 to +20
## Tile (deprecated)

Legacy tile representation. Use TileView instead.

Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section header labels iris.x.Tile as “deprecated”, but the implementation doesn’t mark it deprecated and several iris.x collectives (e.g., all_reduce_atomic, all_gather, reduce_scatter) require a Tile because they read tile.data. Please remove or reword the “deprecated” claim, and clarify when to use TileView vs Tile (TileView for coordinates/shape-only, Tile when carrying computed tile data).

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +29
tile = iris.x.TileView(pid_m, pid_n, BLOCK_M, BLOCK_N)
src_view = iris.x.make_tensor_view(input_ptr, M, N, stride_m, stride_n)
dst_view = iris.x.make_tensor_view(output_ptr, M, N, stride_m, stride_n)
ctx = iris.DeviceContext.initialize(context_tensor, rank, world_size)
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, tile is a TileView, but iris.x.all_reduce_atomic expects a Tile with a populated tile.data (the local tile’s computed values). As written, the example can’t work because TileView has no data field. Update the snippet to compute a tile result (e.g., load/compute) and construct iris.x.Tile(pid_m, pid_n, BLOCK_M, BLOCK_N, data) before calling all_reduce_atomic (or use an op that accepts TileView).

Suggested change
tile = iris.x.TileView(pid_m, pid_n, BLOCK_M, BLOCK_N)
src_view = iris.x.make_tensor_view(input_ptr, M, N, stride_m, stride_n)
dst_view = iris.x.make_tensor_view(output_ptr, M, N, stride_m, stride_n)
ctx = iris.DeviceContext.initialize(context_tensor, rank, world_size)
src_view = iris.x.make_tensor_view(input_ptr, M, N, stride_m, stride_n)
dst_view = iris.x.make_tensor_view(output_ptr, M, N, stride_m, stride_n)
ctx = iris.DeviceContext.initialize(context_tensor, rank, world_size)
# Compute the local tile's data (e.g., load from input_ptr)
offs_m = pid_m * BLOCK_M + tl.arange(0, BLOCK_M)[:, None]
offs_n = pid_n * BLOCK_N + tl.arange(0, BLOCK_N)[None, :]
ptrs = input_ptr + offs_m * stride_m + offs_n * stride_n
data = tl.load(ptrs)
# Construct a Tile with populated data for the collective
tile = iris.x.Tile(pid_m, pid_n, BLOCK_M, BLOCK_N, data)

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +67
BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr):
pid_m = tl.program_id(0)
pid_n = tl.program_id(1)

tile = iris.x.TileView(pid_m, pid_n, BLOCK_M, BLOCK_N)
dst_view = iris.x.make_tensor_view(output_ptr, M, N, stride_m, stride_n)
ctx = iris.DeviceContext.initialize(context_tensor, rank, world_size)

# Call collectives directly
iris.x.all_reduce_atomic(tile, dst_view, ctx)
iris.x.all_gather(tile, dst_view, dim=0, ctx=ctx)
iris.x.all_to_all(tile, dst_view, dst_view, N_per_rank, ctx)
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This usage-pattern snippet calls iris.x.all_to_all(tile, dst_view, dst_view, N_per_rank, ctx) but N_per_rank is not defined/passed in the kernel signature, and all_to_all requires a src_view argument (signature: all_to_all(tile, src_view, dst_view, N_per_rank, ctx)). The example should define N_per_rank: tl.constexpr in the kernel params and create src_view (likely from an input pointer) rather than passing dst_view twice.

Suggested change
BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr):
pid_m = tl.program_id(0)
pid_n = tl.program_id(1)
tile = iris.x.TileView(pid_m, pid_n, BLOCK_M, BLOCK_N)
dst_view = iris.x.make_tensor_view(output_ptr, M, N, stride_m, stride_n)
ctx = iris.DeviceContext.initialize(context_tensor, rank, world_size)
# Call collectives directly
iris.x.all_reduce_atomic(tile, dst_view, ctx)
iris.x.all_gather(tile, dst_view, dim=0, ctx=ctx)
iris.x.all_to_all(tile, dst_view, dst_view, N_per_rank, ctx)
N_per_rank: tl.constexpr,
BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr):
pid_m = tl.program_id(0)
pid_n = tl.program_id(1)
tile = iris.x.TileView(pid_m, pid_n, BLOCK_M, BLOCK_N)
src_view = iris.x.make_tensor_view(input_ptr, M, N, stride_m, stride_n)
dst_view = iris.x.make_tensor_view(output_ptr, M, N, stride_m, stride_n)
ctx = iris.DeviceContext.initialize(context_tensor, rank, world_size)
# Call collectives directly
iris.x.all_reduce_atomic(tile, dst_view, ctx)
iris.x.all_gather(tile, dst_view, dim=0, ctx=ctx)
iris.x.all_to_all(tile, src_view, dst_view, N_per_rank, ctx)

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +93
dummy_locks = tl.zeros((1,), dtype=tl.int32)
config = iris.x.AllReduceConfig(1, dummy_locks)
iris.x.all_reduce_ring(tile, dst_view, ctx)

Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this algorithm-selection example, config = iris.x.AllReduceConfig(1, dummy_locks) is created but never used, and the subsequent call iris.x.all_reduce_ring(tile, dst_view, ctx) doesn’t match the actual all_reduce_ring signature (it requires src_view and dst_view). Either remove AllReduceConfig from this example and show the correct call form for all_reduce_ring(tile, src_view, dst_view, ctx), or add a real API that consumes AllReduceConfig and document that instead.

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +98
config = iris.x.AllReduceConfig(4, locks_ptr)
tile_id = pid_m * num_tiles_n + pid_n
iris.x.all_reduce_spinlock(tile, dst_view, locks_ptr, ctx)
```
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snippet has multiple undefined/mismatched pieces: tile_id = pid_m * num_tiles_n + pid_n references num_tiles_n which isn’t defined in the example, and iris.x.all_reduce_spinlock takes the locks tensor directly (no AllReduceConfig parameter), so the config = iris.x.AllReduceConfig(4, locks_ptr) line is misleading unless there is an API that consumes it. Consider removing AllReduceConfig here, define num_tiles_n (e.g., via tl.cdiv(N, BLOCK_N) or dst_view helpers), and show the correct all_reduce_spinlock(tile, dst_view, locks_ptr, ctx) call with the required locks layout.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Documentation]: Add API reference for iris.ccl, iris.ops and iris.x

3 participants