nip.utils.maths.mean_episode_reward

nip.utils.maths.mean_episode_reward#

nip.utils.maths.mean_episode_reward(reward: Float[Tensor, '... step'], done_mask: Float[Tensor, '... step']) float[source]#

Compute the mean total episode reward for a batch of concatenated episodes.

The done_mask tensor specifies episode boundaries. The mean total reward per episode is computed by summing the rewards within each episode and dividing by the number of episodes.

Note that the first episode is ignored, because it could be partly included in the previous batch.

Parameters:
  • reward (Float["... step"]) – The reward tensor. Multiple episodes are concatenated along the last dimension.

  • done_mask (Float["... step"]) – A mask indicating the end of each episode.

Returns:

mean_total_reward (float) – The mean total reward per episode.

Examples

>>> reward = torch.tensor([[1.0, 2.0, 3.0, 4.0, 5.0]])
>>> mask = torch.tensor([[True, True, False, True, False]])
>>> mean_episode_reward(reward, mask)
4.5