A minimalistic, high-performance Reinforcement Learning framework implemented in Rust.
The current version is a Proof of Concept, stay tuned for future releases!
pip install .python -m twisterl.train --config examples/ppo_puzzle8_v1.jsonThis example trains a model to play the popular "8 puzzle":
|8|7|5|
|3|2| |
|4|6|1|
where numbers have to be shifted around through the empty slot until they are in order.
This model can be trained on a single CPU in under 1 minute (no GPU required!).
A larger version (4x4) is available: examples/ppo_puzzle15_v1.json.
Check the notebook example here!
The examples/grid_world custom environment example here shows how to implement an environment in Rust and expose it to Python with PyO3. You can use it as a template:
-
Create a new crate
cargo new --lib examples/my_env
-
Add dependencies in
examples/my_env/Cargo.toml:[package] name = "my_env" version = "0.1.0" edition = "2021" [lib] name = "my_env" crate-type = ["cdylib"] [dependencies] pyo3 = { version = "0.20", features = ["extension-module"] } twisterl = { path = "path/to/twisterl/rust", features = ["python_bindings"] } # Or using the official crate: # twisterl = { version = "a.b.c", features = ["python_bindings"] }
-
Implement the environment by defining a struct and implementing
twisterl::rl::env::Envfor it. Provide logic forreset,step,observe,reward, etc.
In inference, twisterRL algorithms track the actions applied to the environment externally. If you need the environment itself to track them, implement the track_solution and solution methods in the Env trait.
-
Expose it to Python using
PyBaseEnv:use pyo3::prelude::*; use twisterl::python_interface::env::PyBaseEnv; #[pyclass(name = "MyEnv", extends = PyBaseEnv)] struct PyMyEnv; #[pymethods] impl PyMyEnv { #[new] fn new(...) -> (Self, PyBaseEnv) { let env = MyEnv::new(...); (PyMyEnv, PyBaseEnv { env: Box::new(env) }) } }
-
Add a
pyproject.tomldescribing the Python package so maturin can build a wheel. -
Build and install the module:
pip install . -
Use it from Python:
import my_env env = my_env.MyEnv(...) obs = env.reset()
Refer to grid_world for a complete working example.
TwisteRL uses safetensors as the default checkpoint format for model weights. Safetensors provides:
- Security: No arbitrary code execution (unlike pickle-based
.ptfiles) - Speed: Zero-copy loading for faster model initialization
- HuggingFace compatibility: Standard format for Hub models
Legacy .pt checkpoints are still supported for backward compatibility but will log a warning. To convert existing checkpoints:
from twisterl.utils import convert_pt_to_safetensors
convert_pt_to_safetensors("model.pt") # Creates model.safetensors- High-Performance Core: RL episode loop implemented in Rust for faster training and inference
- Inference-Ready: Easy compilation and bundling of models with environments into portable binaries for inference
- Modular Design: Support for multiple algorithms (PPO, AlphaZero) with interchangeable training and inference
- Language Interoperability: Core in Rust with Python interface
- Symmetry-Aware Training via Twists: Environments can expose observation/action permutations (“twists”) so policies automatically exploit device or puzzle symmetries for faster learning.
- Hybrid rust-python implementation:
- Data collection and inference in Rust
- Training in Python (PyTorch)
- Supported algorithms:
- PPO (Proximal Policy Optimization)
- AlphaZero
- Focus on discrete observation and action spaces
- Support for native Rust environments and for Python environments through a wrapper
Upcoming Features (Alpha Version)
- Full training in Rust
- Extended support for:
- Continuous observation spaces
- Continuous action spaces
- Custom policy architectures
- Native WebAssembly environment support
- Streamlined policy+environment bundle export to WebAssembly
- Comprehensive Python interface
- Enhanced documentation and test coverage
- WebAssembly environment repository
- Browser-based environment and agent visualization
- Interactive web demonstrations
- Serverless distributed training
Currently used in:
- Qiskit Quantum circuit transpiling AI models (Clifford synthesis, routing) Qiskit/qiskit-ibm-transpiler
Perfect for:
- Puzzle-like optimization problems
- Any scenario requiring fast, production performance RL inference
- Limited to discrete observation and action spaces
- Python environments may create performance bottlenecks
- Documentation and testing coverage is currently minimal
- WebAssembly support is in development
We're in early development stages and welcome contributions! Stay tuned for more detailed contribution guidelines.
This project is currently in PoC stage. While functional, it's under active development and the API may change significantly.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
