gt4sd.frameworks.gflownet.dataloader.dataset module¶

Summary¶

Classes:

`GFlowNetDataset`	A dataset for gflownet.
`GFlowNetTask`	A task for gflownet.

Reference¶

class GFlowNetDataset(h5_file=None, target='gap', properties=[])[source]¶

Bases: Dataset

A dataset for gflownet.

__init__(h5_file=None, target='gap', properties=[])[source]¶

Initialize a gflownet dataset. If the dataset is in a format compatible with h5 file, we can directly load it. If the dataset is in a format compatible with xyz file, we have to convert it to h5 file.

Code adapted from: https://github.com/recursionpharma/gflownet/tree/trunk/src/gflownet/data.

Parameters

h5_file (Optional[str, None]) – data file in h5 format.
target (str) – reward target.
properties (List[str]) – relevant properties for the task.

Raises

ValueError – if the dataset is not in a format compatible with h5 file or not present.

set_indexes(ixs)[source]¶

Set the indexes of the dataset split (train/val/test).

Parameters: ixs (Tensor) – indexes of the dataset split.

get_len()[source]¶

Get the length of the full dataset (before splitting).

Returns: the length of the full dataset.
Return type: len

get_stats(percentile=0.95)[source]¶

Get the stats of the dataset.

Parameters: percentile (float) – percentile.
Return type: Tuple[float, float, Any]
Returns: min, max, percentile.

static convert_xyz_to_h5(xyz_path='data/xyz', h5_path='data/qm9.h5', property_names=['rA', 'rB', 'rC', 'mu', 'alpha', 'homo', 'lumo', 'gap', 'r2', 'zpve', 'U0', 'U', 'H', 'G', 'Cv'])[source]¶

Convert the data from xyz to h5. Assumes that the xyz files are extracted in the xyz_path folder.

Parameters

xyz_path (str) – path to the xyz file in input.
h5_path (str) – path to the h5 file in output.
property_names (List[str]) – names of the properties we want to use/overwrite.

Return type

None

static _read_xyz(path)[source]¶

Reads the xyz files in the directory on ‘path’.

Code adapted from: https://www.kaggle.com/code/rmonge/predicting-molecule-properties-based-on-its-smiles/notebook

Parameters: path (str) – the path to the folder.
Returns: list with the characters representing the atoms of a molecule. coordinates: list with the cartesian coordinates of each atom. smile: list with the SMILE representation of a molecule. prop: list with the scalar properties.
Return type: atoms

__len__()[source]¶

Dataset split (train/val/test) length.

Returns: length of the dataset.

__getitem__(idx)[source]¶

Retrieve an item from the dataset by index.

Parameters: index – index for the item.
Return type: Tuple[Any, float]
Returns: an tuple (item, reward).

__annotations__ = {}¶

__doc__ = 'A dataset for gflownet.'¶

__module__ = 'gt4sd.frameworks.gflownet.dataloader.dataset'¶

__parameters__ = ()¶

class GFlowNetTask(configuration, dataset, reward_model=None, wrap_model=None)[source]¶

Bases: object

A task for gflownet.

__init__(configuration, dataset, reward_model=None, wrap_model=None)[source]¶

Initialize a generic gflownet task. The task specifies the reward model for the trajectory. We consider the task as part of the dataset.

Code adapted from: https://github.com/recursionpharma/gflownet/tree/trunk/src/gflownet/tasks.

Parameters

configuration (Dict[str, Any]) – a dictionary with the task configuration.
dataset (GFlowNetDataset) – a dataset instance.
reward_model (Optional[Module, None]) – The model that is used to generate the conditional reward.
wrap_model (Optional[Callable[[Module], Module], None]) – a wrapper function that is applied to the model.

load_task_models()[source]¶

Loads the task models.

Returns: a dictionary with the task models.
Return type: model

sample_conditional_information(n)[source]¶

Samples conditional information for a minibatch.

Parameters: n (int) – number of samples.
Returns: a dictionary with the sampled conditional information.
Return type: cond_info

cond_info_to_reward(cond_info, flat_reward)[source]¶

Combines a minibatch of reward signal vectors and conditional information into a scalar reward.

Parameters

cond_info (Dict[str, Any]) – a dictionary with various conditional informations (e.g. temperature).
flat_reward (FlatRewards) – a 2d tensor where each row represents a series of flat rewards.

Returns

a 1d tensor, a scalar reward for each minibatch entry.

Return type

reward

compute_flat_rewards(x)[source]¶

Compute the flat rewards of mols according the the tasks’ proxies.

Parameters: mols – a list of RDKit molecules.
Returns: a 1d tensor, a scalar reward for each molecule. is_valid: a 1d tensor, a boolean indicating whether the molecule is valid.
Return type: reward

flat_reward_transform(y)[source]¶

Transforms a reward with a generic structure to a flat vector.

Parameters: y (Union[float, Tensor]) – scalar reward for a trajectory.
Return type: FlatRewards

__dict__ = mappingproxy({'__module__': 'gt4sd.frameworks.gflownet.dataloader.dataset', '__doc__': 'A task for gflownet.', '__init__': <function GFlowNetTask.__init__>, 'load_task_models': <function GFlowNetTask.load_task_models>, 'sample_conditional_information': <function GFlowNetTask.sample_conditional_information>, 'cond_info_to_reward': <function GFlowNetTask.cond_info_to_reward>, 'compute_flat_rewards': <function GFlowNetTask.compute_flat_rewards>, 'flat_reward_transform': <function GFlowNetTask.flat_reward_transform>, '_wrap_model_mp': <function GFlowNetTask._wrap_model_mp>, '__dict__': <attribute '__dict__' of 'GFlowNetTask' objects>, '__weakref__': <attribute '__weakref__' of 'GFlowNetTask' objects>, '__annotations__': {}})¶

__doc__ = 'A task for gflownet.'¶

__module__ = 'gt4sd.frameworks.gflownet.dataloader.dataset'¶

__weakref__¶: list of weak references to the object (if defined)

_wrap_model_mp(model)[source]¶

Wraps a nn.Module instance so that it can be shared to DataLoader workers.

Parameters: model (Module) – a nn.Module instance.
Return type: Union[Module, MPModelPlaceholder]