nip.graph_isomorphism.dataset_generation.generate_gi_dataset#
- nip.graph_isomorphism.dataset_generation.generate_gi_dataset(config: GraphIsomorphicDatasetConfig | dict, name: str, batch_size: int = 800000, split_name: str = 'train', device: device | str | int = 'cpu')[source]#
Generate a dataset of pairs of graphs with WL scores.
Graphs are generated using the Erdős-Rényi model. The dataset is generated in three steps:
Generate non-isomorphic graphs. The pairs are divided equally between the different graph sizes and edge probabilities. The number of graphs with a score of 1, 2 and greater than 2 are divided according to the proportions
non_iso_prop_score_1
andnon_iso_prop_score_2
.Generate isomorphic graphs by sampling from the non-isomorphic graph pairs and shuffling the nodes.
Generate new isomorphic graphs.
- Parameters:
config (GraphIsomorphicDatasetConfig or dict) – The configuration for the dataset.
name (str) – The the dataset to save. This will be the name of the directory in which the dataset is saved, under
nip.constants.GI_DATA_DIR
.batch_size (int, default=1000000) – The batch size to use when generating the graphs.
split_name (str, default="train") – The name of the split to save the dataset as.
device (TorchDevice, default="cpu") – The device to use for the computation.