nip.graph_isomorphism.data.GraphIsomorphismDataset#

class nip.graph_isomorphism.data.GraphIsomorphismDataset(hyper_params: HyperParameters, settings: ExperimentSettings, protocol_handler: ProtocolHandler, train: bool = True)[source]#

A dataset for the graph isomorphism experiments.

Uses the a pre-generated set of graphs.

Parameters:
  • hyper_params (HyperParameters) – The parameters for the experiment.

  • settings (ExperimentSettings) – The settings for the experiment.

  • protocol_handler (ProtocolHandler) – The protocol handler for the experiment.

  • train (bool) – Whether to load the training or test set.

Methods Summary

__getitem__(index)

__getitems__(index)

__init__(hyper_params, settings, ...[, train])

__len__()

__repr__()

Return repr(self).

_download()

Download the raw data.

_get_pretrained_cache_dir(model_name)

Get the path to the directory with the cached pretrained embeddings.

_get_pretrained_metadata_path(model_name)

Get the path to the metadata file for the pretrained embeddings.

_get_pretrained_mmap_path(model_name)

Get the path to the memory-mapped tensor for the pretrained embeddings.

add_pretrained_embeddings(model_name, ...[, ...])

Add pretrained embeddings to the dataset and cache them.

build_tensor_dict()

Build the tensor dict dataset from the raw data.

build_torch_dataset(**kwargs)

Build the base PyTorch dataset, from which the tensordict is constructed.

get_pretrained_embedding_dtype(model_name)

Get the dtype of the embeddings for a pretrained model.

get_pretrained_embedding_feature_shape(...)

Get the feature shape of the embeddings for a pretrained model.

load_pretrained_embeddings(model_name)

Load cached embeddings for a pretrained model.

Attributes

adjacency_dtype

device

The device on which the dataset is stored.

instance_keys

The keys specifying the input instance.

keys

The keys (field names) in the dataset.

pretrained_embeddings_dir

The path to the directory containing cached pretrained model embeddings.

pretrained_model_names

The names of the pretrained models for which we have computed embeddings.

processed_dir

The path to the directory containing the processed data.

raw_dir

The path to the directory containing the raw data.

x_dtype

y_dtype

Methods

__getitem__(index: None | int | slice | str | Tensor | List[Any] | Tuple[Any, ...]) TensorDict | Tensor[source]#
__getitems__(index: None | int | slice | str | Tensor | List[Any] | Tuple[Any, ...]) TensorDict | Tensor[source]#
__init__(hyper_params: HyperParameters, settings: ExperimentSettings, protocol_handler: ProtocolHandler, train: bool = True)[source]#
__len__() int[source]#
__repr__() str[source]#

Return repr(self).

_download()[source]#

Download the raw data.

_get_pretrained_cache_dir(model_name: str) Path[source]#

Get the path to the directory with the cached pretrained embeddings.

Parameters:

model_name (str) – The name of the pretrained model.

Returns:

cache_dir (Path) – The path to the cache directory.

_get_pretrained_metadata_path(model_name: str) Path[source]#

Get the path to the metadata file for the pretrained embeddings.

Parameters:

model_name (str) – The name of the pretrained model.

Returns:

metadata_path (Path) – The path to the metadata file.

_get_pretrained_mmap_path(model_name: str) Path[source]#

Get the path to the memory-mapped tensor for the pretrained embeddings.

Parameters:

model_name (str) – The name of the pretrained model.

Returns:

mmap_path (Path) – The path to the memory-mapped tensor.

add_pretrained_embeddings(model_name: str, full_embeddings: Tensor, overwrite_cache: bool = False)[source]#

Add pretrained embeddings to the dataset and cache them.

Parameters:
  • model_name (str) – The name of the pretrained model.

  • full_embeddings (Tensor) – The embeddings generated from the full original dataset, before any rearrangement or filtering.

  • overwrite_cache (bool, default=False) – Whether to overwrite the cached embeddings if they already exist.

build_tensor_dict() TensorDict[source]#

Build the tensor dict dataset from the raw data.

Returns:

dataset (TensorDict) – The dataset as a tensor dict.

build_torch_dataset(**kwargs) Dataset[source]#

Build the base PyTorch dataset, from which the tensordict is constructed.

The implementation of this method is optional, but is required for using pretrained models because there we need direct access to the raw dataset.

Parameters:

**kwargs – Additional keyword arguments to pass to the dataset class.

get_pretrained_embedding_dtype(model_name: str) dtype[source]#

Get the dtype of the embeddings for a pretrained model.

Parameters:

model_name (str) – The name of the pretrained model.

Returns:

dtype (torch.dtype) – The dtype of the embeddings.

get_pretrained_embedding_feature_shape(model_name: str) Size[source]#

Get the feature shape of the embeddings for a pretrained model.

The feature shape is the tuple of dimensions of the embeddings excluding the batch dimension.

Parameters:

model_name (str) – The name of the pretrained model.

Returns:

shape (torch.Size) – The shape of the embeddings.

load_pretrained_embeddings(model_name: str)[source]#

Load cached embeddings for a pretrained model.

Parameters:

model_name (str) – The name of the pretrained model.

Raises:

CachedPretrainedEmbeddingsNotFound – If the cached embeddings are not found.