Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Dec 4, 2025

Summary

This PR adds a developer guide for Python and Cython code in cuda/core. The guide establishes conventions and best practices to maintain code quality and consistency across the codebase.

What's Included

The guide covers 10 major areas:

  1. File Structure - Ordering of elements (SPDX headers, imports, __all__, classes, functions)
  2. Package Layout - Organization of .pyx, .pxd, and .py files
  3. Import Statements - Five-group organization with alphabetical sorting
  4. Class and Function Definitions - Ordering within classes (dunder methods, methods, properties)
  5. Naming Conventions - Project-specific patterns (Cython c_ prefix, type suffixes)
  6. Type Annotations and Declarations - Modern PEP 604 union syntax, Cython declarations
  7. Docstrings - NumPy docstring style with Sphinx cross-references
  8. Errors and Warnings - Custom CUDA exceptions, error handling patterns
  9. CUDA-Specific Patterns - GIL management for CUDA driver API calls
  10. Development Lifecycle - Two-phase development (Python first, then Cython optimization)

Key Principles

  • PEP 8 and PEP 257 Base: References these as normative standards, documenting only project-specific extensions
  • Guidelines, Not Rules: Emphasizes readability and maintainability over rigid enforcement
  • Codebase-Aligned: Examples reflect actual patterns from the codebase

File Added

  • cuda_core/docs/source/developer-guide.rst

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Dec 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost Andy-Jost added the documentation Improvements or additions to documentation label Dec 4, 2025
@Andy-Jost Andy-Jost self-assigned this Dec 4, 2025
@Andy-Jost Andy-Jost added the cuda.core Everything related to the cuda.core module label Dec 4, 2025
@Andy-Jost Andy-Jost added this to the cuda-python 13-next, 12-next milestone Dec 4, 2025
@Andy-Jost Andy-Jost requested review from keenan-simpson and removed request for keenan-simpson December 4, 2025 20:50
@Andy-Jost
Copy link
Contributor Author

Link to rendered document: style-guide.md

@Andy-Jost
Copy link
Contributor Author

I'm interested in feedback on this idea. With a comprehensive style guide, aside from the other benefits listed in the description, we can use LLMs to help refactor, clean, and document the code. If the team is comfortable with this approach, I can begin to apply rules conservatively to the code base.

@Andy-Jost Andy-Jost removed the request for review from keenan-simpson December 4, 2025 20:56
@Andy-Jost Andy-Jost removed this from the cuda-python 13-next, 12-next milestone Dec 4, 2025
This commit adds a complete style guide covering conventions for Python
and Cython code in cuda/core/experimental. The guide includes:

- File structure and organization (SPDX headers, imports, __all__, ordering)
- Package layout (.pyx/.pxd/.py files, subpackage patterns)
- Import statement organization (5 groups with alphabetical sorting)
- Class and function definition ordering (dunder methods, methods, properties)
- Naming conventions (PascalCase, snake_case, UPPER_SNAKE_CASE)
- Type annotations (PEP 604 union syntax, forward references)
- Docstrings (NumPy style with comprehensive examples)
- Error handling and warnings (custom exceptions, stacklevel, one-time warnings)
- Memory management (resource lifecycle, cleanup patterns)
- Thread safety and concurrency (locks, thread-local storage)
- Cython-specific features (cdef/cpdef/def, nogil, inline functions)
- Constants and magic numbers (naming, CUDA constants)
- Comments and inline documentation (TODO, NOTE patterns)
- Code organization within files
- Performance considerations (GIL management, C types)
- API design principles (public vs private, backward compatibility)
- CUDA-specific patterns (GIL management for driver API calls)
- Copyright and licensing (SPDX format)

The guide follows PEP 8 as the base and promotes modern Python practices
(PEP 604, PEP 563) while documenting current codebase patterns.
@rwgk
Copy link
Collaborator

rwgk commented Dec 5, 2025

Will look more closely asap.

Drive-by comment: I was surprised to see the new file under cuda_core/cuda/core/. I'd expect it to be under cuda_core/, like other meta-kind-of files, e.g. LICENSE, README.md, etc.

@Andy-Jost
Copy link
Contributor Author

Will look more closely asap.

Drive-by comment: I was surprised to see the new file under cuda_core/cuda/core/. I'd expect it to be under cuda_core/, like other meta-kind-of files, e.g. LICENSE, README.md, etc.

I thought about that, too. I agree it belongs at the root.

Document the two-phase development approach:
- Phase 1: Start with Python driver implementation and tests
- Phase 2: Optimize by switching to cydriver with nogil blocks

Includes step-by-step conversion guide and before/after examples.
@rparolin
Copy link
Collaborator

rparolin commented Dec 5, 2025

Will look more closely asap.
Drive-by comment: I was surprised to see the new file under cuda_core/cuda/core/. I'd expect it to be under cuda_core/, like other meta-kind-of files, e.g. LICENSE, README.md, etc.

I thought about that, too. I agree it belongs at the root.

Do we want this in the root? I'd expect it within a docs folder. That shouldn't impact an LLMs ability to discover and utilize it.

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't blocking, but I have a few concerns:

  1. I am a bit concerned about the team all using various agents with this an getting a variety of different results, with no way to actually enforce any of these guidelines without the human effort of checking the guide, and comparing a give changeset to it.
  2. Given the extensive guidelines, I think this will actually add human time for: 1) waiting for the LLM to take action, 2) babysitting it when it, as expected, doesn't follow the guidelines in an acceptable-to-the-author way.


Files in `cuda/core/experimental` must follow a consistent structure. The ordering of elements within a file is as follows:

### 1. SPDX Copyright Header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 on including things in this guide that can be automated, and I realize that results in a lack of completeness.

In fact, I'd rather our lint rule for this particular thing just add it to the top of the file when it fails.

I think this can be done for all file types, including pyx files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the actual contents and made a note that pre-commit adds this where needed.


### 3. `__all__` Declaration

If the module exports public API elements, include an `__all__` list after the imports and before any other code. This explicitly defines the public API of the module.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly misleading, since __all__ is about what happens when you star-import a module.

+1 on defining this, but stating it defines the public API I would remove from the description.

LEGACY_DEFAULT_STREAM = C_LEGACY_DEFAULT_STREAM
```

### 5. Principal Class or Function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit mixed on enforcing ordering of elements here. I do think it can help cut review time, but in practice I find that this kind of strict ordering only starts mattering when files get huge, to which I would probably say "Split the file up" 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been finding that some of our larger class defs (>400 LOC) are difficult to navigate, particularly when methods are very long, properties are mixed in between methods, and nothing is alphabetized. It's not a huge deal, but I'd love to see a move towards a bit more structure.

- Not every file will have all sections. For example, a utility module may not have a principal class.
- The distinction between "principal" and "other" classes is based on the file's primary purpose. If a file exists primarily to define one class, that class is the principal class.
- Private implementation functions should be placed at the end of the file to keep the public API visible at the top.
- **Within each section**, classes and functions should be sorted alphabetically by name. The principal class or function is an exception to this rule, as it appears first in its respective section.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should be enforced without a linter. It feels a bit draconian to me to require someone to reorder their methods.

For example, I like to order methods by "relevance radius", which is a squishy concept that is something like "methods that are related to each other are close to each other". I find this makes debugging way easier, if only because I can see more related code in a single vim window.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great point. My reason for spelling out the order this way is not to create unnecessary rigidity for human developers, but to make a clear instruction for LLMs to follow.

I see it this way:

  • This rule is made to be broken. It should be disregarded anytime something else is better in a specific case.
  • When there is no particular reason to care, spelling out a default order for the LLMs to follow allows us to automatically clean up.

The intent is to reduce the load on human developers. My hope is that we can worry less about ordering things manually and rely more on LLMs to clean up after the fact. The language here probably could be relaxed.


2. **Keep `.pxd` files minimal**: Only include declarations needed for Cython compilation. Omit implementation details, docstrings, and Python-only code.

3. **Use `__all__` in submodules**: Each submodule should define `__all__` to explicitly declare its public API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public API bit seems to be repeated in a bunch of places, and it's misleading. That said, I'm unsure of a better way to phrase it, but perhaps there's a slightly less "sure" way of stating it?

How about just "Each submodule should define __all__"?


### Implementation Comments

Add comments to explain complex logic or non-obvious behavior:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really question whether this is useful for humans. I suppose it may be useful for an LLM 🤷🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed generic things like this. There was a fair bit of AI "slop" in the previous version. This one should be much more concise and focused.


### Prefer `cdef` for Internal Functions

Use `cdef` functions for internal operations that don't need to be callable from Python:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already specified above?

Comment on lines 1563 to 1568
result = []
for i in range(n):
result.append(i)

# Preferred: Use C array or pre-allocate
cdef int* c_result = <int*>malloc(n * sizeof(int))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing as how these are not equivalent, it seems odd to say "prefer X over Y" when X and Y are very different. Sometimes you want a list, sometimes you don't need it and can use a C array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Use naming conventions to distinguish public and private APIs:

- **Public API**: No leading underscore, documented in docstrings, included in `__all__`
- **Private API**: Leading underscore (`_`), may have minimal documentation, not in `__all__`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since __all__ only affects star import behavior, this is redundant, since leading underscore names are not included in star imports.


### Guidelines

1. **Placement**: The copyright header must be the first lines of the file, before any imports or other code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this covered by the "all files must include a copyright header"?

@leofang leofang linked an issue Dec 6, 2025 that may be closed by this pull request
Copy link
Contributor

@mdboom mdboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a defacto standard for where to put this kind of stuff and tell the IDE where and when to apply it.

- **Abstract base classes**: ABCs that define interfaces (e.g., `MemoryResource` in `_buffer.pyx`)
- **Other public classes**: Additional classes exported by the module

**All classes and functions in this section should be sorted alphabetically by name**, regardless of their relationship to the principal class. The principal class appears first as an exception to this rule.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 on alphabetical ordering. I think we should reflect the logical relationships between classes instead (which I understand is a bit mushy as a concept). It's easy enough in an IDE to jump to the one you are looking for using search.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased throughout to make it clear that the suggested order is not a requirement, that logical ordering specifically is acceptable, and that alphabetical is a possible fallback if nothing better applies.


- Functions with names starting with `_` (private)
- `cdef inline` functions used for internal implementation
- Helper functions not part of the public API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is fine, but seems backward to me, given that languages like C have always required forward declarations to do things like this, code tends to be ordered from low-level to high-level, with the "principal public thing" at the bottom, not the top. I realize Python has never had this problem, but it does seems reversed from almost anything else.

Copy link
Contributor Author

@Andy-Jost Andy-Jost Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a general statement about this in the "File Structure" section. Many of us have grown accustomed to C/C++-style declare-before-use rules, but I think putting the most important thing first is a better rule when we can follow it because it is better for readability.

The `cuda/core/experimental` package uses three types of files:

1. **`.pyx` files**: Cython implementation files containing the actual code
2. **`.pxd` files**: Cython declaration files containing type definitions and function signatures for C-level access
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For the future): We really should be requiring that all .pyx files have a corresponding .pyi file. Without it, modern IDEs (which don't actually import modules) can not perform auto completion and cross referencing. But we can add that later as an effort to improve the IDE experience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 2025, is there a way we can auto-generate .pyi from .py/.pxd/.pyx? If not, let's not make it a must.

One thing I hate the most is Python typing. Does not make any sense to me to maintain 2-3 files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using .pyi files. I can look into auto-generating those. After we have an established practice, we can update this document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each `.pyx` file that defines classes or functions used by other Cython modules, create a corresponding `.pxd` file:

- **`.pxd` file**: Contains `cdef` class declarations, `cdef`/`cpdef` function signatures, and `cdef` attribute declarations
- **`.pyx` file**: Contains the full implementation including Python methods, docstrings, and implementation details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As above): The better way is to include docstrings and type annotations in a .pyi file and just implementation in the .pyx file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Cython would honor .pyi files? To the very least, I don't think it is friendly to write .pyi files when the types are needed to complete a cdef function?


6. **Separate concerns**: Use `.py` files for pure Python utilities, `.pyx` files for Cython implementations that need C-level performance.

## Import Statements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were writing this for a human, I would take this section (and some other sections below that are basically "PEP8") and just link to them so we aren't maintaining duplication of community standards. If the LLM can handle something like that, I think we should do that here. Given "agents", shouldn't they just be able to run ruff?


#### Cython `cdef` Variables

Consider prefixing `cdef` variables with `c_` to distinguish them from Python variables. This improves code readability by making it clear which variables are C-level types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


```python
LEGACY_DEFAULT_STREAM = C_LEGACY_DEFAULT_STREAM
PER_THREAD_DEFAULT_STREAM = C_PER_THREAD_DEFAULT_STREAM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, related constants should be grouped into enums.

def __dealloc__(self):
"""Automatic cleanup when object is garbage collected."""
DMR_close(self)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we recommend that all such objects that require cleanup define a context manager, i.e. __enter__ and __exit__ methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for now. I can write a section explaining the current rationale in a separate PR if it is OK.


5. **Use stream-ordered deallocation**: When deallocating buffers, use the appropriate stream for asynchronous cleanup to avoid blocking operations.

6. **Track resource ownership**: Clearly document which objects own CUDA handles and are responsible for cleanup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we do better than just documenting, by specifying all objects contain a strong reference to their owner to prevent use-after-free errors etc.?

EDIT: Though maybe for a style guide this is too complex of an issue. For style, documenting that such a situation exists is the first requirement for actually patching these sort of holes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, some of the things I'd like to document would more properly belong in a "Developer's Guide." I am therefore considering whether to expand the scope of this document. I'd restructure it into sections where "Style" is one section.


### Inline Functions

Use `inline` for small, frequently-called functions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for Cython, which is not a modern C compiler ;) IME, sometimes the C compiler can see through it and make the right choice and other times not. The salient point for me is that we would ideally backup the choice of inline with a benchmark.

@leofang
Copy link
Member

leofang commented Dec 9, 2025

Merging my offline comment to here. My 2c:

  • If possible I’d like to make part (or all) of this PR rendered as Sphinx docs, for both internal and external contributors to use as developer guide
  • I’d also like all LLM to be able to read it when working on the codebase, including but not limited to: Cursor, CodeRabbit, Copilot, …

@Andy-Jost
Copy link
Contributor Author

Moved to cuda_core/docs

- Change ordering rules from strict requirements to suggestions
- Prefer logical ordering, with alphabetical as fallback
- Make __init__/__cinit__ first by convention in dunder methods
- Keep import ordering strict (enforced by ruff linter)
- Simplify SPDX header guidance to reference existing patterns
- Use CamelCase terminology for class names
- Rename title from "Style Guide" to "Developer Guide"
- Remove naming convention rules that duplicate PEP 8
- Keep Cython-specific c_ prefix guidance
- Remove generic comment advice covered by PEP 8
- Simplify constants and API design guidelines
- Replace cuda/core/experimental with cuda/core throughout
- Update all cuda.core.experimental imports to cuda.core
- Clarify __all__ is for star-import behavior, not "public API"
- Remove __all__ from public/private API distinction (redundant)
- Add rationale for file structure ordering (important-to-detailed)
- Add hedge: helper functions near call sites is fine
- Extend private functions section with broader category
- Rewrite File Structure intro for clarity (readability/maintainability focus)
- Remove sections: Thread Safety, Cython-Specific Features, Constants,
  Code Organization, API Design, Copyright, Comments, Memory Management
- These were either too prescriptive, duplicative, or better left to
  existing documentation and developer judgment
- Add introductory clause before numbered subsections
- Mark __all__ and Type Aliases as optional
- Fix contradiction in Principal Class section
- Make language consistent with "guidelines, not rules" framing
- Streamline all subsections for parallel structure
- Add PEP 257 reference and module docstring placement guidance
- Add Sphinx cross-reference roles documentation with link
- Update examples with proper :class: cross-references
- Streamline Errors and Warnings section (CUDA-specific only)
- Remove .pyx cdef variable duplication in example
- Simplify Helper Functions section
- Clean up Import Statements example
- Remove Performance Considerations section (generic Cython advice)
- Rewrite GIL management section with softer language
- Frame GIL release as optimization, not requirement
- Reference Development Lifecycle for pure Python phase
- Remove Function-Level nogil subsection (note alternative inline)
- Simplify exception raising guidance
- Rewrite as "common pattern" rather than mandatory approach
- Remove rigid Guidelines subsection ("Always", "Don't skip")
- Simplify examples, keeping one before/after pair
- Add concise bullet list of key conversion changes
- Frame as practical advice, not mandates
- Use device attribute queries instead of buffer/stream operations
- Remove resource handle complexity (cu(), _h_ptr, _h_stream)
- Keep examples focused on Cythonization process
- Align GIL management and Development Lifecycle examples
Copy link
Contributor Author

@Andy-Jost Andy-Jost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a pretty major rewrite. I think I addressed everything raised in comments, please let me know if I've missed anything.

The `cuda/core/experimental` package uses three types of files:

1. **`.pyx` files**: Cython implementation files containing the actual code
2. **`.pxd` files**: Cython declaration files containing type definitions and function signatures for C-level access
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


6. **Separate concerns**: Use `.py` files for pure Python utilities, `.pyx` files for Cython implementations that need C-level performance.

## Import Statements
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed several sections below to eliminate generic content. The new version should be much tighter. I also removed certain things that might conflict with or become out of date with the linters (e.g., the verbatim contents of SPDX headers).

As for the import statements, there is partial overlap with ruff: it enforces alphabetization but not the "5 sections" format we follow. I think it makes sense to describe those here.


## Class and Function Definitions

### Class Definition Order
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I softened the language here quite a bit. There is a suggested order, but the guide makes it clear that developers are free to deviate according to their best judgement. The goal is not to constrain humans but to give guidance to machines.

@Andy-Jost Andy-Jost changed the title Add comprehensive style guide for cuda/core/experimental Add developer guide for cuda.core Jan 8, 2026
- Convert developer-guide.md to developer-guide.rst using pandoc
- Move to docs/source/ for Sphinx integration
- Add to toctree in index.rst
- Add SPDX header
- Delete original markdown file
@Andy-Jost
Copy link
Contributor Author

/ok to test 98a0a64

@Andy-Jost Andy-Jost marked this pull request as ready for review January 8, 2026 22:45
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update cuda.core developer guide

6 participants