Sokoban RL

A playable lesson inspired by gym-sokoban. Manual mode · PPO Agent mode.
Last Reward
0.0
Total Score
0.0
Steps
0
Manual mode. Use WASD or ↑↓←→ to move / push. R reset · U undo · N new level. Push all orange boxes onto the yellow targets.

Legend

Wall
Target
Box
Box on Target
Player

Rewards (gym-sokoban)

Per step−0.1
Box onto target+1.0
Box off target−1.0
All boxes solved+10.0

PPO Agent

Statusidle
Episode0
Avg return (last 20)
Best return
Policy entropy
Solve rate (last 20)
Training return per episode
Policy π(a | current state)
↑ push↓ push← push→ push
PPO with a small MLP policy (two hidden layers) trained live in your browser via clipped-surrogate updates. Training uses curriculum imitation from an A* teacher to bootstrap, then pure self-play refinement — this keeps in-browser training tractable (seconds, not hours).

Event Log

✓ Solved!

All boxes are on their targets.

0.0