class panda_gym.envs.core.RobotTaskEnv(robot: PyBulletRobot, task: Task)

Robotic task goal env, as the junction of a task and a robot.

close() None

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

render(mode: str, width: int = 720, height: int = 480, target_position: Optional[ndarray] = None, distance: float = 1.4, yaw: float = 45, pitch: float = - 30, roll: float = 0) Optional[ndarray]


If mode is “human”, make the rendering real-time. All other arguments are unused. If mode is “rgb_array”, return an RGB array of the scene.

  • mode (str) – “human” of “rgb_array”. If “human”, this method waits for the time necessary to have a realistic temporal rendering and all other args are ignored. Else, return an RGB array.

  • width (int, optional) – Image width. Defaults to 720.

  • height (int, optional) – Image height. Defaults to 480.

  • target_position (np.ndarray, optional) – Camera targetting this postion, as (x, y, z). Defaults to [0., 0., 0.].

  • distance (float, optional) – Distance of the camera. Defaults to 1.4.

  • yaw (float, optional) – Yaw of the camera. Defaults to 45.

  • pitch (float, optional) – Pitch of the camera. Defaults to -30.

  • roll (int, optional) – Rool of the camera. Defaults to 0.


An RGB array if mode is ‘rgb_array’, else None.

Return type

RGB np.ndarray or None

reset(seed: Optional[int] = None) Dict[str, ndarray]

Resets the environment to an initial state and returns an initial observation.

This method should also reset the environment’s random number generator(s) if seed is an integer or if the environment has not yet initialized a random number generator. If the environment already has a random number generator and reset is called with seed=None, the RNG should not be reset. Moreover, reset should (in the typical use case) be called with an integer seed right after initialization and then never again.


the initial observation. info (optional dictionary): a dictionary containing extra information, this is only returned if return_info is set to true

Return type

observation (object)

step(action: ndarray) Tuple[Dict[str, ndarray], float, bool, Dict[str, Any]]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).


action (object) – an action provided by the agent


agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, logging, and sometimes learning)

Return type

observation (object)