nip.utils.hugging_face.count_tokens

Contents

nip.utils.hugging_face.count_tokens#

nip.utils.hugging_face.count_tokens(rollouts: NestedArrayDict, agent_id: int, model_name: str) TokenCounts[source]#

Count the number of tokens in the rollouts.

This function counts both the prompt and completion tokens for each rollout and round. It uses the Hugging Face tokenizer for the specified model.

For the prompt, it first takes the chat history and puts it into the chat template for the model.

Parameters:
  • rollouts (NestedArrayDict) –

    The rollouts nested array dictionary. Has keys:

    • (“agents”, “prompt”) (rollout round agent message field): The prompt messages passed to each model, as a chat history.

    • (“agents”, “raw_message”) (rollout round agent): The completion messages returned by each model.

  • agent_id (int) – The ID of the agent for which to count tokens. This is used to index into the rollouts dictionary.

  • model_name (str) – The name of the model to use for tokenization, typically a Hugging Face identifier.

Returns:

token_count (TokenCount) – A dataclass containing the token counts for prompts, completions, and total tokens. The shape of all elements is (rollout round).