Train with stable-baselines3
Warning
SB3 is not compatible with panda-gym
v3 for the moment. (See SB3/PR#780). The following documentation is therefore not yet valid. To use panda-gym
with SB3, you will have to use panda-gym==2.0.0
.
You can train the environments with any gymnasium compatible library. In this documentation we explain how to use one of them: stable-baselines3 (SB3).
Install SB3
To install SB3, follow the instructions from its documentation Install stable-baselines3.
Train
Now that SB3 is installed, you can run the following code to train an agent. You can use every algorithm compatible with Box
action space, see stable-baselines3/RL Algorithm). In the following example, a DDPG agent is trained to solve th Reach task.
import gym
import panda_gym
from stable_baselines3 import DDPG
env = gym.make("PandaReach-v2")
model = DDPG(policy="MultiInputPolicy", env=env)
model.train(30_000)
Note
Here we provide the canonical code for training with SB3. For any information on the setting of hyperparameters, verbosity, saving the model and more please read the SB3 documentation.
Bonus: Train with RL Baselines3 Zoo
RL Baselines3 Zoo is the training framework associated with SB3.
It provides scripts for training, evaluating agents, setting hyperparameters, plotting results and recording video. It also contains already optimized hypermeters, including for some panda-gym
environments.
Warning
The current version of RL Baselines3 Zoo provides hyperparameters for version 1 of panda-gym
, but not for version 2. Before training with RL Baselines3 Zoo, you will have to set your own hyperparameters by editing hyperparameters/<ALGO>.yml
. For more information, please read the README of RL Baselines3 Zoo.
Train
To use it, follow the instructions for its installation, then use the following command.
python train.py --algo <ALGO> --env <ENV>
For example, to train an agent with TQC on PandaPickAndPlace-v3
:
python train.py --algo tqc --env PandaPickAndPlace-v3
Enjoy
To visualize the trained agent, follow the instructions in the SB3 documentation. It is necessary to add --env-kwargs render_mode:human
when running the enjoy script.
python enjoy.py --algo <ALGO> --env <ENV> --folder <TRAIN_AGENT_FOLDER> --env-kwargs render_mode:human