nip.protocols.main_protocols.MerlinArthurProtocol

nip.protocols.main_protocols.MerlinArthurProtocol#

class nip.protocols.main_protocols.MerlinArthurProtocol(hyper_params: HyperParameters, settings: ExperimentSettings, *, verifier_name: str = 'verifier')[source]#

Implementation of the Merlin-Arthur Classifier (MAC) protocol [WSZP24].

The protocol consists of two provers and a verifier. One of the two provers sends a message to the verifier, who then makes a decision. Which prover sends the message is determined randomly. “prover0” attempts to convince the verifier of a negative answer, while “prover1” attempts to convince the verifier of a positive answer.

Parameters:: hyper_params (HyperParameters) – The parameters of the experiment.

Methods Summary

`__init__`(hyper_params, settings, *[, ...])
`_get_agent_decision_made_mask`(round_id, y, ...)	Get a mask indicating whether an agent has made a decision.
`_get_new_terminated_mask`(round_id, ...)	Get a mask indicating whether the episode has been newly terminated.
`_get_verifier_guess_reward_continuous`(...)	Compute the guess reward for the verifier, with `continuous_decision`.
`_get_verifier_guess_reward_discrete`(reward, ...)	Compute the guess reward for the verifier, without `continuous_decision`.
`_include_prover_rewards`(...)	Compute the rewards for the other agents and add them to the current reward.
`can_agent_be_active`(agent_name, round_id, ...)	Specify whether an agent can be active in a given round.
`can_agent_be_active_any_channel`(agent_name, ...)	Specify whether an agent can be active in any channel in a given round.
`can_agent_see_channel`(agent_name, channel_name)	Determine whether an agent can see a channel.
`get_active_agents_mask_from_rounds_and_seed`(...)	Get a boolean mask indicating which agents are active in a given round.
`get_agent_visible_channels`(agent_name)	Get the names of the channels visible to an agent.
`get_verifier_guess_mask_from_rounds_and_seed`(...)	Get a boolean mask indicating when the verifier can make a guess.
`max_reward`(agent_name)	Get the maximum possible reward for an agent.
`min_reward`(agent_name)	Get the minimum possible reward for an agent.
`reward_mid_point_estimate`(agent_name)	Get an estimate of the expected reward if all agents play randomly.
`step_interaction_protocol`(env_td)	Take a step in the interaction protocol.

Attributes

`agent_channel_visibility`
`agent_channel_visibility_mask`	A boolean mask indicating which agents can see which message channels.
`agent_first_active_round`	The first round in which each agent is or can be active.
`agent_names`
`can_be_zero_knowledge`
`default_stackelberg_sequence`	The default Stackelberg sequence for the protocol.
`max_message_rounds`
`max_verifier_questions`
`message_channel_names`
`min_message_rounds`
`num_agents`	The number of agents in the protocol.
`num_message_channels`	The number of message channels in the protocol.
`protocol_common`	The common protocol parameters.
`prover_indices`	The indices of the provers in the list of agent names.
`prover_names`	The names of the provers in the protocol.
`stackelberg_sequence`	The actual Stackelberg sequence used in this experiment.
`verifier_index`	The index of the verifier in the list of agent names.
`verifier_names`	The names of the verifiers in the protocol.
`verifier_neither_accept_nor_reject_reward`	The reward for the verifier when they neither accept nor reject.

Methods

__init__(hyper_params: HyperParameters, settings: ExperimentSettings, *, verifier_name: str = 'verifier')[source]#

_get_agent_decision_made_mask(round_id: Int[Tensor, '...'], y: Int[Tensor, '...'], guess_mask: Bool[Tensor, '...'], decision: Int[Tensor, '...'], *, follow_force_guess: bool = True) → Bool[Tensor, '...'][source]#

Get a mask indicating whether an agent has made a decision.

Parameters:

round_id (Int[Tensor, "..."]) – The round number.
y (Int[Tensor, "..."]) – The target value.
guess_mask (Bool[Tensor, "..."]) – A mask indicating whether the agent is allowed to make a guess.
decision (Int[Tensor, "..."]) – The decision output of the agent.
follow_force_guess (bool, default=True) – Whether to follow the force_guess parameter, which forces the agent to make a certain decision.

Returns:

decision_made (Bool[Tensor, “…”]) – A mask indicating whether the agent has made a decision.

_get_new_terminated_mask(round_id: Int[Tensor, '...'], verifier_decision_made: Bool[Tensor, '...']) → Bool[Tensor, '...'][source]#

Get a mask indicating whether the episode has been newly terminated.

“Newly terminated” means that the episode has been terminated this round. This happens when the max number of rounds has been reached and the verifier has not guessed.

Parameters:

round_id (Int[Tensor, "..."]) – The round number.
verifier_decision_made (Bool[Tensor, "..."]) – A mask indicating whether the verifier has made a decision.

Returns:

terminated (Bool[Tensor, “…”]) – A mask indicating whether the episode has been newly terminated.

_get_verifier_guess_reward_continuous(reward: Float[Tensor, '... agent'], y: Int[Tensor, '...'], verifier_decision_made: Bool[Tensor, '...'], verifier_float_decision: Float[Tensor, '...']) → Float[Tensor, '... agent'][source]#

Compute the guess reward for the verifier, with continuous_decision.

This computes the reward for the verifier when they make a guess, when the more fine-grained decision is used.

Parameters:

reward (Float[Tensor, "... agent"]) – The tensor of rewards for the agents, which is updated in place.
y (Int[Tensor, "..."]) – The target value.
verifier_decision_made (Bool[Tensor, "..."]) – A mask indicating whether the verifier has made a decision.
verifier_float_decision (Float[Tensor, "..."]) – The verifier’s (continuous) decision.

_get_verifier_guess_reward_discrete(reward: Float[Tensor, '... agent'], y: Int[Tensor, '...'], verifier_decision_made: Bool[Tensor, '...'], verifier_decision: Int[Tensor, '...']) → Float[Tensor, '... agent'][source]#

Compute the guess reward for the verifier, without continuous_decision.

This computes the reward for the verifier when they make a guess, when the more fine-grained decision is not used.

Parameters:

reward (Float[Tensor, "... agent"]) – The tensor of rewards for the agents, which is updated in place.
y (Int[Tensor, "..."]) – The target value.
verifier_decision_made (Bool[Tensor, "..."]) – A mask indicating whether the verifier has made a decision.
verifier_decision (Int[Tensor, "..."]) –
The verifier’s (discrete) decision. This has the following possible values:
- 0: reject
- 1: accept
- 2: no decision
- 3: end with neither accept nor reject (only relevant for text-based scenarios)

_include_prover_rewards(verifier_decision_made: Bool[Tensor, '...'], verifier_decision: Int[Tensor, '...'], verifier_float_decision: Float[Tensor, '...'] | None, reward: Float[Tensor, '... agent'], env_td: TensorDictBase | NestedArrayDict)[source]#

Compute the rewards for the other agents and add them to the current reward.

The default implementation is as follows:

If there is one prover, they are rewarded when the verifier guesses “accept”.
If there are two provers, the first is rewarded when the verifier guesses “reject” and the second is rewarded when the verifier guesses “accept”.

When the continuous_decision key is present in the environment tensor, a continuous version of this is used instead, where the reward is a linear transformation of the verifier’s decision.

Implement a custom method for protocols with more than two provers, or for protocols with different reward schemes.

The reward tensor is updated in place, adding in the rewards for the agents at the appropriate indices.

Parameters:

verifier_decision_made (Bool[Tensor, "..."]) – A boolean mask indicating whether the verifier has made a decision.
verifier_decision (Int[Tensor, "..."]) – The verifier’s discrete decision.
verifier_float_decision (Float[Tensor, "..."] | None) – The verifier’s continuous decision. This is only used if the continuous_decision key is present in the environment tensor. If not None, it is used to compute the reward for the provers instead of the discrete decision.
reward (Float[Tensor, "... agent"]) – The currently computed reward, which should include the reward for the verifier. This is updated in place.
env_td (TensorDictBase | NestedArrayDict) – The current observation and state. If a NestedArrayDict, it is converted to a TensorDictBase.

can_agent_be_active(agent_name: str, round_id: int, channel_name: str) → bool[source]#

Specify whether an agent can be active in a given round.

The provers can only be active in the first round, and the verifier can only be active in the second round.

Returns:: can_be_active (bool) – Whether the agent can be active in the given round.

can_agent_be_active_any_channel(agent_name: str, round_id: int) → bool[source]#

Specify whether an agent can be active in any channel in a given round.

For non-deterministic protocols, this is true if the agent has some probability of being active.

Returns:: can_be_active (bool) – Whether the agent can be active in the given round.

can_agent_see_channel(agent_name: str, channel_name: str) → bool[source]#

Determine whether an agent can see a channel.

Returns:: can_see_channel (bool) – Whether the agent can see the channel.

get_active_agents_mask_from_rounds_and_seed(round_id: Int[Tensor, '...'], seed: Int[Tensor, '...']) → Bool[Tensor, '... agent channel'][source]#

Get a boolean mask indicating which agents are active in a given round.

A random one of the two provers goes first, and the verifier goes second.

Parameters:

round_id (Int[Tensor, "..."]) – The round of the protocol.
seed (Int[Tensor, "..."]) – The per-environment seed.

Returns:

active_agents (Bool[Tensor, “… agent channel”]) – A boolean mask indicating which agents are active in the given round.

get_agent_visible_channels(agent_name: str) → list[str][source]#

Get the names of the channels visible to an agent.

Parameters:: agent_name (str) – The name of the agent.
Returns:: visible_channels (list[str]) – The names of the channels visible to the agent.

get_verifier_guess_mask_from_rounds_and_seed(round_id: Int[Tensor, '...'], seed: Int[Tensor, '...']) → Bool[Tensor, '...'][source]#

Get a boolean mask indicating when the verifier can make a guess.

Takes as input a tensor of rounds and returns a boolean mask indicating when the verifier can make a guess for each element in the batch.

Parameters:

round_id (Int[Tensor, "..."]) – The batch of rounds.
seed (Int[Tensor, "..."]) – The per-environment seed.

Returns:

verifier_turn (Bool[Tensor, “…”]) – Which batch items the verifiers can make a guess in.

max_reward(agent_name: str) → float[source]#

Get the maximum possible reward for an agent.

For the verifier, this is the maximum reward it gets for guessing plus the bonus for not guessing in each round (if positive).

For the prover, this is the reward it gets for being accepted by the verifier.

Parameters:: agent_name (str) – The name of the agent to get the maximum reward for.
Returns:: max_reward (float) – The maximum possible reward for the agent.

min_reward(agent_name: str) → float[source]#

Get the minimum possible reward for an agent.

For the verifier, this is the minimum reward it gets for guessing plus the bonus for not guessing in each round (if negative).

For the prover, this 0.

Parameters:: agent_name (str) – The name of the agent to get the maximum reward for.
Returns:: min_reward (float) – The minimum possible reward for the agent.

reward_mid_point_estimate(agent_name: str) → float[source]#

Get an estimate of the expected reward if all agents play randomly.

This is used to compute the mid-point of the reward range for the agent.

For example, if the agent gets reward -1 for a wrong guess and 1 for a correct guess, the mid-point estimate could be 0.

Parameters:: agent_name (str) – The name of the agent to get the reward mid-point for.
Returns:: reward_mid_point (float) – The expected reward for the agent if all agents play randomly.

step_interaction_protocol(env_td: TensorDictBase | NestedArrayDict) → tuple[Bool[Tensor, '...'], Bool[Tensor, '... agent'], Bool[Tensor, '...'], Float[Tensor, '... agent']][source]#

Take a step in the interaction protocol.

Computes the done signals and reward.

Used in the _step method of the environment.

Parameters:

env_td (TensorDictBase | NestedArrayDict) –

The current observation and state. If a NestedArrayDict, it is converted to a TensorDictBase. Has keys:

”y” (… 1): The target value.
”round” (…): The current round.
(“agents”, “decision”) (… agent): The decision of each agent.
(“agents”, “continuous_decision”) (… agent): (Optional) A more fine-grained version of the decision, which is a float between -1 and 1.
(“agents”, “valid_response”) (… agent): (Optional) A boolean mask indicating whether the agent’s response is valid.
”done” (…): A boolean mask indicating whether the episode is done.
(“agents”, “done”) (… agent): A boolean mask indicating whether each
agent is done.
”terminated” (…): A boolean mask indicating whether the episode has been
terminated.

Returns:

shared_done (Bool[Tensor, “…”]) – A boolean mask indicating whether the episode is done because all relevant agents have made a decision.
agent_done (Bool[Tensor, “… agent”]) – A boolean mask indicating whether each agent is done, because they have made a decision. This is the same as shared_done for agents which don’t make decisions.
terminated (Bool[Tensor, “…”]) – A boolean mask indicating whether the episode has been terminated because the max number of rounds has been reached and the verifier has not guessed.
reward (Float[Tensor, “… agent”]) – The reward for the agents.

nip.protocols.main_protocols.MerlinArthurProtocol

Contents

nip.protocols.main_protocols.MerlinArthurProtocol#