Train with stable-baselines3

You can train the environments with any OpenAI/gym compatible library. In this documentation we explain how to use one of them: stable-baselines3 (SB3).

Install SB3

To install SB3, follow the instructions from its documentation Install stable-baselines3.

Alternatively, you can install panda-gym and SB3 directly with a single command:

pip install panda-gym[extra]


If you use zsh terminal, the syntax is pip install 'panda-gym[extra]'


Now that SB3 is installed, you can run the following code to train an agent. You can use every algorithm compatible with Box action space, see stable-baselines3/RL Algorithm). In the following example, a DDPG agent is trained to solve th Reach task.

import gym
import panda_gym
from stable_baselines3 import DDPG

env = gym.make("PandaReach-v2")
model = DDPG(policy="MultiInputPolicy", env=env)


Here we provide the canonical code for training with SB3. For any information on the setting of hyperparameters, verbosity, saving the model, … please read the SB3 documentation.

Bonus: Train with RL Baselines3 Zoo

RL Baselines3 Zoo is the training framework associated with SB3. It provides scripts for training, evaluating agents, setting hyperparameters, plotting results and recording video. It also contains already optimized hypermeters, including for some panda-gym environments.


The current version of RL Baselines3 Zoo provides hyperparameters for version 1 of panda-gym, but not for version 2. Before training with RL Baselines3 Zoo, you will have to set your own hyperparameters by editing hyperparameters/<ALGO>.yml. For more information, please read the README of RL Baselines3 Zoo.


To use it, follow the instructions for its installation, then use the following command.

python --algo <ALGO> --env <ENV>

For example, to train an agent with TQC on PandaPickAndPlace-v2:

python --algo tqc --env PandaPickAndPlace-v2


To visualize the trained agent, follow the instructions in the SB3 documentation. It is necessary to add --env-kwargs render:True when running the enjoy script.

python --algo <ALGO> --env <ENV> --folder <TRAIN_AGENT_FOLDER> --env-kwargs render:True