kernel-engineering

A repo for learning kernel-engineering/gpu-programming

Setup

make setup

Notebook	Description
Control Divergence	Explores warp divergence in GPU kernels — what happens when threads within a warp take different branches, how it serializes execution, and benchmarks the performance cost.
TF32 Precision & Performance	Demonstrates TensorFloat-32 (TF32) on Ampere+ GPUs — compares matmul precision (TF32 vs FP32 vs FP16 vs FP64), shows TF32 has FP16's precision but FP32's range, and benchmarks the throughput speedup.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
notebooks		notebooks
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
makefile		makefile
pyproject.toml		pyproject.toml
uv.lock		uv.lock