TwisteRL

A minimalistic, high-performance Reinforcement Learning framework implemented in Rust.

The current version is a Proof of Concept, stay tuned for future releases!

Install

pip install .

Use

Training

python -m twisterl.train --config examples/ppo_puzzle8_v1.json

This example trains a model to play the popular "8 puzzle":

|8|7|5|
|3|2| |
|4|6|1|

where numbers have to be shifted around through the empty slot until they are in order.

This model can be trained on a single CPU in under 1 minute (no GPU required!). A larger version (4x4) is available: examples/ppo_puzzle15_v1.json.

Inference

Check the notebook example here!

Creating your own environment

The examples/grid_world custom environment example here shows how to implement an environment in Rust and expose it to Python with PyO3. You can use it as a template:

Create a new crate
```
cargo new --lib examples/my_env
```

Add dependencies in examples/my_env/Cargo.toml:

[package]
name = "my_env"
version = "0.1.0"
edition = "2021"

[lib]
name = "my_env"
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.20", features = ["extension-module"] }
twisterl = { path = "path/to/twisterl/rust", features = ["python_bindings"] }
# Or using the official crate:
# twisterl = { version = "a.b.c", features = ["python_bindings"] }

Implement the environment by defining a struct and implementing twisterl::rl::env::Env for it. Provide logic for reset, step, observe, reward, etc.

In inference, twisterRL algorithms track the actions applied to the environment externally. If you need the environment itself to track them, implement the track_solution and solution methods in the Env trait.

Expose it to Python using PyBaseEnv:

use pyo3::prelude::*;
use twisterl::python_interface::env::PyBaseEnv;

#[pyclass(name = "MyEnv", extends = PyBaseEnv)]
struct PyMyEnv;

#[pymethods]
impl PyMyEnv {
    #[new]
    fn new(...) -> (Self, PyBaseEnv) {
        let env = MyEnv::new(...);
        (PyMyEnv, PyBaseEnv { env: Box::new(env) })
    }
}

Add a pyproject.toml describing the Python package so maturin can build a wheel.
Build and install the module:
```
pip install .
```

Use it from Python:

import my_env
env = my_env.MyEnv(...)
obs = env.reset()

Refer to grid_world for a complete working example.

Checkpoint Format

TwisteRL uses safetensors as the default checkpoint format for model weights. Safetensors provides:

Security: No arbitrary code execution (unlike pickle-based .pt files)
Speed: Zero-copy loading for faster model initialization
HuggingFace compatibility: Standard format for Hub models

Legacy .pt checkpoints are still supported for backward compatibility but will log a warning. To convert existing checkpoints:

from twisterl.utils import convert_pt_to_safetensors

convert_pt_to_safetensors("model.pt")  # Creates model.safetensors

Documentation

Permutation twists in environments

🚀 Key Features

High-Performance Core: RL episode loop implemented in Rust for faster training and inference
Inference-Ready: Easy compilation and bundling of models with environments into portable binaries for inference
Modular Design: Support for multiple algorithms (PPO, AlphaZero) with interchangeable training and inference
Language Interoperability: Core in Rust with Python interface
Symmetry-Aware Training via Twists: Environments can expose observation/action permutations (“twists”) so policies automatically exploit device or puzzle symmetries for faster learning.

🏗️ Current State (PoC)

Hybrid rust-python implementation:
- Data collection and inference in Rust
- Training in Python (PyTorch)
Supported algorithms:
- PPO (Proximal Policy Optimization)
- AlphaZero
Focus on discrete observation and action spaces
Support for native Rust environments and for Python environments through a wrapper

🚧 Roadmap

Upcoming Features (Alpha Version)

Full training in Rust
Extended support for:
- Continuous observation spaces
- Continuous action spaces
- Custom policy architectures
Native WebAssembly environment support
Streamlined policy+environment bundle export to WebAssembly
Comprehensive Python interface
Enhanced documentation and test coverage

💎 Future Possibilities

WebAssembly environment repository
Browser-based environment and agent visualization
Interactive web demonstrations
Serverless distributed training

🎮 Use Cases

Currently used in:

Qiskit Quantum circuit transpiling AI models (Clifford synthesis, routing) Qiskit/qiskit-ibm-transpiler

Perfect for:

Puzzle-like optimization problems
Any scenario requiring fast, production performance RL inference

🔧 Current Limitations

Limited to discrete observation and action spaces
Python environments may create performance bottlenecks
Documentation and testing coverage is currently minimal
WebAssembly support is in development

🤝 Contributing

We're in early development stages and welcome contributions! Stay tuned for more detailed contribution guidelines.

📄 Note

This project is currently in PoC stage. While functional, it's under active development and the API may change significantly.

📜 License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github		.github
assets		assets
docs		docs
examples		examples
rust		rust
src/twisterl		src/twisterl
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TwisteRL

Install

Use

Training

Inference

Creating your own environment

Checkpoint Format

Documentation

🚀 Key Features

🏗️ Current State (PoC)

🚧 Roadmap

💎 Future Possibilities

🎮 Use Cases

🔧 Current Limitations

🤝 Contributing

📄 Note

📜 License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

License

AI4quantum/twisteRL

Folders and files

Latest commit

History

Repository files navigation

TwisteRL

Install

Use

Training

Inference

Creating your own environment

Checkpoint Format

Documentation

🚀 Key Features

🏗️ Current State (PoC)

🚧 Roadmap

💎 Future Possibilities

🎮 Use Cases

🔧 Current Limitations

🤝 Contributing

📄 Note

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages