Step-by-step Reimplementation Attempt of MuZero for Ms Pacman

Work in progress. Feedback welcome.

Dyna-Q Notebook

Dyna-Q is a Q-learning algorithm with a planning component that iterates additional Q-learning steps based on previously encountered state-action-reward-next state transitions. We implement Tabular Dyna-Q from Chapter 8 of Sutton & Barto (Example 8.1) with a gridworld environment.

DQN Notebook

DQN trains a network to predict Q-values based on previously seen transitions sampled from a replay buffer in a supervised fashion.

Environment	Random policy	Trained policy	Training details
Pong		Random action during evaluation with 0% vs 5% probability. Scores: 5:21 vs 8:21	10 mio frames

MCTS Notebook

Monte-Carlo tree search is a seach algorithm that selects at each step the most promising action, in terms of how good actions are expected to be vs. how much uncertainty there is. New actions are initialized by accessing either the environment or a model thereof, while existing Q-value estimates are used within the tree. We implement a naive Python version of the MCTS algorithm used by MuZero, and compare its output with the faster JAX implementation released by DeepMind, MCTX.

MuZero

Work in progress: As a first step, I'm implementing changes to the DQN code only based on the MuZero paper, with the goal of having a scaled-down version of MuZero that can demonstrate improvements to DQN on Ms Pacman. I won't be looking at the published pseudocode for now. Afterwards, I'll review my implementation against the pseudocode.

References

Main

Reinforcement Learning: An Introduction (Sutton & Barto)

Playing Atari with Deep Reinforcement Learning (DQN Arxiv 2013)

Human-level control through deep reinforcement learning (DQN Nature 2015)

Mastering the game of Go with deep neural networks and tree search (AlphaGo)

Mastering the Game of Go without Human Knowledge (AlphaGo Zero)

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm (AlphaZero)

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero)

MuZero Pseudocode

Monte Carlo tree search in JAX (MCTX)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
notebooks		notebooks
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Step-by-step Reimplementation Attempt of MuZero for Ms Pacman

Contents

Dyna-Q Notebook

DQN Notebook

MCTS Notebook

MuZero

References

Main

Related

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Step-by-step Reimplementation Attempt of MuZero for Ms Pacman

Contents

Dyna-Q Notebook

DQN Notebook

MCTS Notebook

MuZero

References

Main

Related

TODO

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages