DDLP: Distributed Deep Learning Primitives

DDLP is a lightweight, pure-Python library providing fused communication/computation primitives for distributed deep learning on NVIDIA GPUs.

Primitive

LinearColumnwise: Tensor-parallel column-wise linear layer (local GEMM + AllGather). Supports three backends:
- pytorch -- pure PyTorch (F.linear + torch.distributed.all_gather)
- fuser -- nvFuser-accelerated (requires nvfuser_direct)
- transformer_engine -- Transformer Engine integration (requires transformer_engine)

Installation

pip install -e ./python

Optional backends:

pip install -e "./python[fuser]"        # nvFuser backend
pip install -e "./python[te]"           # Transformer Engine backend

Usage

import torch
import torch.distributed as dist
from ddlp.primitives import LinearColumnwise

dist.init_process_group(backend="nccl")
model = LinearColumnwise(in_features=1024, out_features=4096, backend="pytorch", device="cuda")
output = model(torch.randn(2, 128, 1024, device="cuda"))

Testing

torchrun --nproc_per_node=<N> tests/test_linear_columnwise.py

Dependencies

Required: PyTorch (with CUDA and torch.distributed)
Optional: nvfuser_direct (fuser backend), transformer_engine (TE backend)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
python		python
tests		tests
.gitignore		.gitignore
.lintrunner.toml		.lintrunner.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDLP: Distributed Deep Learning Primitives

Primitive

Installation

Usage

Testing

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DDLP: Distributed Deep Learning Primitives

Primitive

Installation

Usage

Testing

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages