nip.graph_isomorphism.dataset_generation.generate_gi_dataset#

nip.graph_isomorphism.dataset_generation.generate_gi_dataset(config: GraphIsomorphicDatasetConfig | dict, name: str, batch_size: int = 800000, split_name: str = 'train', device: device | str | int = 'cpu')[source]#

Generate a dataset of pairs of graphs with WL scores.

Graphs are generated using the Erdős-Rényi model. The dataset is generated in three steps:

Generate non-isomorphic graphs. The pairs are divided equally between the different graph sizes and edge probabilities. The number of graphs with a score of 1, 2 and greater than 2 are divided according to the proportions non_iso_prop_score_1 and non_iso_prop_score_2.
Generate isomorphic graphs by sampling from the non-isomorphic graph pairs and shuffling the nodes.
Generate new isomorphic graphs.

Parameters:

config (GraphIsomorphicDatasetConfig or dict) – The configuration for the dataset.
name (str) – The the dataset to save. This will be the name of the directory in which the dataset is saved, under nip.constants.GI_DATA_DIR.
batch_size (int, default=1000000) – The batch size to use when generating the graphs.
split_name (str, default="train") – The name of the split to save the dataset as.
device (TorchDevice, default="cpu") – The device to use for the computation.

nip.graph_isomorphism.dataset_generation.generate_gi_dataset