Skip to content

PoC of a Reinforcement Learning framework. The focus of this framework is to be efficient and portable: it allows users to train RL models to solve problems and export the models so that they are easy to use and very efficient for inference.

License

Notifications You must be signed in to change notification settings

AI4quantum/twisteRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TwisteRL

TwisteRL

A minimalistic, high-performance Reinforcement Learning framework implemented in Rust.

The current version is a Proof of Concept, stay tuned for future releases!

Install

pip install .

Use

Training

python -m twisterl.train --config examples/ppo_puzzle8_v1.json

This example trains a model to play the popular "8 puzzle":

|8|7|5|
|3|2| |
|4|6|1|

where numbers have to be shifted around through the empty slot until they are in order.

This model can be trained on a single CPU in under 1 minute (no GPU required!). A larger version (4x4) is available: examples/ppo_puzzle15_v1.json.

Inference

Check the notebook example here!

Creating your own environment

The examples/grid_world custom environment example here shows how to implement an environment in Rust and expose it to Python with PyO3. You can use it as a template:

  1. Create a new crate

    cargo new --lib examples/my_env
  2. Add dependencies in examples/my_env/Cargo.toml:

    [package]
    name = "my_env"
    version = "0.1.0"
    edition = "2021"
    
    [lib]
    name = "my_env"
    crate-type = ["cdylib"]
    
    [dependencies]
    pyo3 = { version = "0.20", features = ["extension-module"] }
    twisterl = { path = "path/to/twisterl/rust", features = ["python_bindings"] }
    # Or using the official crate:
    # twisterl = { version = "a.b.c", features = ["python_bindings"] }
  3. Implement the environment by defining a struct and implementing twisterl::rl::env::Env for it. Provide logic for reset, step, observe, reward, etc.

In inference, twisterRL algorithms track the actions applied to the environment externally. If you need the environment itself to track them, implement the track_solution and solution methods in the Env trait.

  1. Expose it to Python using PyBaseEnv:

    use pyo3::prelude::*;
    use twisterl::python_interface::env::PyBaseEnv;
    
    #[pyclass(name = "MyEnv", extends = PyBaseEnv)]
    struct PyMyEnv;
    
    #[pymethods]
    impl PyMyEnv {
        #[new]
        fn new(...) -> (Self, PyBaseEnv) {
            let env = MyEnv::new(...);
            (PyMyEnv, PyBaseEnv { env: Box::new(env) })
        }
    }
  2. Add a pyproject.toml describing the Python package so maturin can build a wheel.

  3. Build and install the module:

    pip install .
  4. Use it from Python:

    import my_env
    env = my_env.MyEnv(...)
    obs = env.reset()

Refer to grid_world for a complete working example.

Checkpoint Format

TwisteRL uses safetensors as the default checkpoint format for model weights. Safetensors provides:

  • Security: No arbitrary code execution (unlike pickle-based .pt files)
  • Speed: Zero-copy loading for faster model initialization
  • HuggingFace compatibility: Standard format for Hub models

Legacy .pt checkpoints are still supported for backward compatibility but will log a warning. To convert existing checkpoints:

from twisterl.utils import convert_pt_to_safetensors

convert_pt_to_safetensors("model.pt")  # Creates model.safetensors

Documentation

🚀 Key Features

  • High-Performance Core: RL episode loop implemented in Rust for faster training and inference
  • Inference-Ready: Easy compilation and bundling of models with environments into portable binaries for inference
  • Modular Design: Support for multiple algorithms (PPO, AlphaZero) with interchangeable training and inference
  • Language Interoperability: Core in Rust with Python interface
  • Symmetry-Aware Training via Twists: Environments can expose observation/action permutations (“twists”) so policies automatically exploit device or puzzle symmetries for faster learning.

🏗️ Current State (PoC)

  • Hybrid rust-python implementation:
    • Data collection and inference in Rust
    • Training in Python (PyTorch)
  • Supported algorithms:
    • PPO (Proximal Policy Optimization)
    • AlphaZero
  • Focus on discrete observation and action spaces
  • Support for native Rust environments and for Python environments through a wrapper

🚧 Roadmap

Upcoming Features (Alpha Version)

  • Full training in Rust
  • Extended support for:
    • Continuous observation spaces
    • Continuous action spaces
    • Custom policy architectures
  • Native WebAssembly environment support
  • Streamlined policy+environment bundle export to WebAssembly
  • Comprehensive Python interface
  • Enhanced documentation and test coverage

💎 Future Possibilities

  • WebAssembly environment repository
  • Browser-based environment and agent visualization
  • Interactive web demonstrations
  • Serverless distributed training

🎮 Use Cases

Currently used in:

Perfect for:

  • Puzzle-like optimization problems
  • Any scenario requiring fast, production performance RL inference

🔧 Current Limitations

  • Limited to discrete observation and action spaces
  • Python environments may create performance bottlenecks
  • Documentation and testing coverage is currently minimal
  • WebAssembly support is in development

🤝 Contributing

We're in early development stages and welcome contributions! Stay tuned for more detailed contribution guidelines.

📄 Note

This project is currently in PoC stage. While functional, it's under active development and the API may change significantly.

📜 License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

About

PoC of a Reinforcement Learning framework. The focus of this framework is to be efficient and portable: it allows users to train RL models to solve problems and export the models so that they are easy to use and very efficient for inference.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 7