gt4sd.frameworks.gflownet.dataloader.dataset module

Summary

Classes:

GFlowNetDataset

A dataset for gflownet.

GFlowNetTask

A task for gflownet.

Reference

class GFlowNetDataset(h5_file=None, target='gap', properties=[])[source]

Bases: Dataset

A dataset for gflownet.

__init__(h5_file=None, target='gap', properties=[])[source]

Initialize a gflownet dataset. If the dataset is in a format compatible with h5 file, we can directly load it. If the dataset is in a format compatible with xyz file, we have to convert it to h5 file.

Code adapted from: https://github.com/recursionpharma/gflownet/tree/trunk/src/gflownet/data.

Parameters
  • h5_file (Optional[str, None]) – data file in h5 format.

  • target (str) – reward target.

  • properties (List[str]) – relevant properties for the task.

Raises

ValueError – if the dataset is not in a format compatible with h5 file or not present.

set_indexes(ixs)[source]

Set the indexes of the dataset split (train/val/test).

Parameters

ixs (Tensor) – indexes of the dataset split.

get_len()[source]

Get the length of the full dataset (before splitting).

Returns

the length of the full dataset.

Return type

len

get_stats(percentile=0.95)[source]

Get the stats of the dataset.

Parameters

percentile (float) – percentile.

Return type

Tuple[float, float, Any]

Returns

min, max, percentile.

static convert_xyz_to_h5(xyz_path='data/xyz', h5_path='data/qm9.h5', property_names=['rA', 'rB', 'rC', 'mu', 'alpha', 'homo', 'lumo', 'gap', 'r2', 'zpve', 'U0', 'U', 'H', 'G', 'Cv'])[source]

Convert the data from xyz to h5. Assumes that the xyz files are extracted in the xyz_path folder.

Parameters
  • xyz_path (str) – path to the xyz file in input.

  • h5_path (str) – path to the h5 file in output.

  • property_names (List[str]) – names of the properties we want to use/overwrite.

Return type

None

static _read_xyz(path)[source]

Reads the xyz files in the directory on ‘path’.

Code adapted from: https://www.kaggle.com/code/rmonge/predicting-molecule-properties-based-on-its-smiles/notebook

Parameters

path (str) – the path to the folder.

Returns

list with the characters representing the atoms of a molecule. coordinates: list with the cartesian coordinates of each atom. smile: list with the SMILE representation of a molecule. prop: list with the scalar properties.

Return type

atoms

__len__()[source]

Dataset split (train/val/test) length.

Returns

length of the dataset.

__getitem__(idx)[source]

Retrieve an item from the dataset by index.

Parameters

index – index for the item.

Return type

Tuple[Any, float]

Returns

an tuple (item, reward).

__annotations__ = {}
__doc__ = 'A dataset for gflownet.'
__module__ = 'gt4sd.frameworks.gflownet.dataloader.dataset'
__parameters__ = ()
class GFlowNetTask(configuration, dataset, reward_model=None, wrap_model=None)[source]

Bases: object

A task for gflownet.

__init__(configuration, dataset, reward_model=None, wrap_model=None)[source]

Initialize a generic gflownet task. The task specifies the reward model for the trajectory. We consider the task as part of the dataset.

Code adapted from: https://github.com/recursionpharma/gflownet/tree/trunk/src/gflownet/tasks.

Parameters
  • configuration (Dict[str, Any]) – a dictionary with the task configuration.

  • dataset (GFlowNetDataset) – a dataset instance.

  • reward_model (Optional[Module, None]) – The model that is used to generate the conditional reward.

  • wrap_model (Optional[Callable[[Module], Module], None]) – a wrapper function that is applied to the model.

load_task_models()[source]

Loads the task models.

Returns

a dictionary with the task models.

Return type

model

sample_conditional_information(n)[source]

Samples conditional information for a minibatch.

Parameters

n (int) – number of samples.

Returns

a dictionary with the sampled conditional information.

Return type

cond_info

cond_info_to_reward(cond_info, flat_reward)[source]

Combines a minibatch of reward signal vectors and conditional information into a scalar reward.

Parameters
  • cond_info (Dict[str, Any]) – a dictionary with various conditional informations (e.g. temperature).

  • flat_reward (FlatRewards) – a 2d tensor where each row represents a series of flat rewards.

Returns

a 1d tensor, a scalar reward for each minibatch entry.

Return type

reward

compute_flat_rewards(x)[source]

Compute the flat rewards of mols according the the tasks’ proxies.

Parameters

mols – a list of RDKit molecules.

Returns

a 1d tensor, a scalar reward for each molecule. is_valid: a 1d tensor, a boolean indicating whether the molecule is valid.

Return type

reward

flat_reward_transform(y)[source]

Transforms a reward with a generic structure to a flat vector.

Parameters

y (Union[float, Tensor]) – scalar reward for a trajectory.

Return type

FlatRewards

__dict__ = mappingproxy({'__module__': 'gt4sd.frameworks.gflownet.dataloader.dataset', '__doc__': 'A task for gflownet.', '__init__': <function GFlowNetTask.__init__>, 'load_task_models': <function GFlowNetTask.load_task_models>, 'sample_conditional_information': <function GFlowNetTask.sample_conditional_information>, 'cond_info_to_reward': <function GFlowNetTask.cond_info_to_reward>, 'compute_flat_rewards': <function GFlowNetTask.compute_flat_rewards>, 'flat_reward_transform': <function GFlowNetTask.flat_reward_transform>, '_wrap_model_mp': <function GFlowNetTask._wrap_model_mp>, '__dict__': <attribute '__dict__' of 'GFlowNetTask' objects>, '__weakref__': <attribute '__weakref__' of 'GFlowNetTask' objects>, '__annotations__': {}})
__doc__ = 'A task for gflownet.'
__module__ = 'gt4sd.frameworks.gflownet.dataloader.dataset'
__weakref__

list of weak references to the object (if defined)

_wrap_model_mp(model)[source]

Wraps a nn.Module instance so that it can be shared to DataLoader workers.

Parameters

model (Module) – a nn.Module instance.

Return type

Union[Module, MPModelPlaceholder]