gt4sd.frameworks.gflownet.dataloader.dataset module¶
Summary¶
Classes:
A dataset for gflownet. |
|
A task for gflownet. |
Reference¶
- class GFlowNetDataset(h5_file=None, target='gap', properties=[])[source]¶
Bases:
Dataset
A dataset for gflownet.
- __init__(h5_file=None, target='gap', properties=[])[source]¶
Initialize a gflownet dataset. If the dataset is in a format compatible with h5 file, we can directly load it. If the dataset is in a format compatible with xyz file, we have to convert it to h5 file.
Code adapted from: https://github.com/recursionpharma/gflownet/tree/trunk/src/gflownet/data.
- Parameters
h5_file (
Optional
[str
,None
]) – data file in h5 format.target (
str
) – reward target.properties (
List
[str
]) – relevant properties for the task.
- Raises
ValueError – if the dataset is not in a format compatible with h5 file or not present.
- set_indexes(ixs)[source]¶
Set the indexes of the dataset split (train/val/test).
- Parameters
ixs (
Tensor
) – indexes of the dataset split.
- get_len()[source]¶
Get the length of the full dataset (before splitting).
- Returns
the length of the full dataset.
- Return type
len
- get_stats(percentile=0.95)[source]¶
Get the stats of the dataset.
- Parameters
percentile (
float
) – percentile.- Return type
Tuple
[float
,float
,Any
]- Returns
min, max, percentile.
- static convert_xyz_to_h5(xyz_path='data/xyz', h5_path='data/qm9.h5', property_names=['rA', 'rB', 'rC', 'mu', 'alpha', 'homo', 'lumo', 'gap', 'r2', 'zpve', 'U0', 'U', 'H', 'G', 'Cv'])[source]¶
Convert the data from xyz to h5. Assumes that the xyz files are extracted in the xyz_path folder.
- Parameters
xyz_path (
str
) – path to the xyz file in input.h5_path (
str
) – path to the h5 file in output.property_names (
List
[str
]) – names of the properties we want to use/overwrite.
- Return type
None
- static _read_xyz(path)[source]¶
Reads the xyz files in the directory on ‘path’.
Code adapted from: https://www.kaggle.com/code/rmonge/predicting-molecule-properties-based-on-its-smiles/notebook
- Parameters
path (
str
) – the path to the folder.- Returns
list with the characters representing the atoms of a molecule. coordinates: list with the cartesian coordinates of each atom. smile: list with the SMILE representation of a molecule. prop: list with the scalar properties.
- Return type
atoms
- __getitem__(idx)[source]¶
Retrieve an item from the dataset by index.
- Parameters
index – index for the item.
- Return type
Tuple
[Any
,float
]- Returns
an tuple (item, reward).
- __annotations__ = {}¶
- __doc__ = 'A dataset for gflownet.'¶
- __module__ = 'gt4sd.frameworks.gflownet.dataloader.dataset'¶
- __parameters__ = ()¶
- class GFlowNetTask(configuration, dataset, reward_model=None, wrap_model=None)[source]¶
Bases:
object
A task for gflownet.
- __init__(configuration, dataset, reward_model=None, wrap_model=None)[source]¶
Initialize a generic gflownet task. The task specifies the reward model for the trajectory. We consider the task as part of the dataset.
Code adapted from: https://github.com/recursionpharma/gflownet/tree/trunk/src/gflownet/tasks.
- Parameters
configuration (
Dict
[str
,Any
]) – a dictionary with the task configuration.dataset (
GFlowNetDataset
) – a dataset instance.reward_model (
Optional
[Module
,None
]) – The model that is used to generate the conditional reward.wrap_model (
Optional
[Callable
[[Module
],Module
],None
]) – a wrapper function that is applied to the model.
- load_task_models()[source]¶
Loads the task models.
- Returns
a dictionary with the task models.
- Return type
model
- sample_conditional_information(n)[source]¶
Samples conditional information for a minibatch.
- Parameters
n (
int
) – number of samples.- Returns
a dictionary with the sampled conditional information.
- Return type
cond_info
- cond_info_to_reward(cond_info, flat_reward)[source]¶
Combines a minibatch of reward signal vectors and conditional information into a scalar reward.
- Parameters
cond_info (
Dict
[str
,Any
]) – a dictionary with various conditional informations (e.g. temperature).flat_reward (
FlatRewards
) – a 2d tensor where each row represents a series of flat rewards.
- Returns
a 1d tensor, a scalar reward for each minibatch entry.
- Return type
reward
- compute_flat_rewards(x)[source]¶
Compute the flat rewards of mols according the the tasks’ proxies.
- Parameters
mols – a list of RDKit molecules.
- Returns
a 1d tensor, a scalar reward for each molecule. is_valid: a 1d tensor, a boolean indicating whether the molecule is valid.
- Return type
reward
- flat_reward_transform(y)[source]¶
Transforms a reward with a generic structure to a flat vector.
- Parameters
y (
Union
[float
,Tensor
]) – scalar reward for a trajectory.- Return type
FlatRewards
- __dict__ = mappingproxy({'__module__': 'gt4sd.frameworks.gflownet.dataloader.dataset', '__doc__': 'A task for gflownet.', '__init__': <function GFlowNetTask.__init__>, 'load_task_models': <function GFlowNetTask.load_task_models>, 'sample_conditional_information': <function GFlowNetTask.sample_conditional_information>, 'cond_info_to_reward': <function GFlowNetTask.cond_info_to_reward>, 'compute_flat_rewards': <function GFlowNetTask.compute_flat_rewards>, 'flat_reward_transform': <function GFlowNetTask.flat_reward_transform>, '_wrap_model_mp': <function GFlowNetTask._wrap_model_mp>, '__dict__': <attribute '__dict__' of 'GFlowNetTask' objects>, '__weakref__': <attribute '__weakref__' of 'GFlowNetTask' objects>, '__annotations__': {}})¶
- __doc__ = 'A task for gflownet.'¶
- __module__ = 'gt4sd.frameworks.gflownet.dataloader.dataset'¶
- __weakref__¶
list of weak references to the object (if defined)
- _wrap_model_mp(model)[source]¶
Wraps a nn.Module instance so that it can be shared to DataLoader workers.
- Parameters
model (
Module
) – a nn.Module instance.- Return type
Union
[Module
,MPModelPlaceholder
]