gt4sd.algorithms.conditional_generation.paccmann_rl.implementation module

Implementation of PaccMann^RL conditional generators.

Summary

Classes:

ConditionalGenerator

Abstract interface for a conditional generator.

ProteinSequenceConditionalGenerator

Protein conditional generator as implemented in https://doi.org/10.1088/2632-2153/abe808 (originally https://arxiv.org/abs/2005.13285).

TranscriptomicConditionalGenerator

Transcriptomic conditional generator as implemented in https://doi.org/10.1016/j.isci.2021.102269 (originally https://doi.org/10.1007/978-3-030-45257-5_18, https://arxiv.org/abs/1909.05114).

Reference

class ConditionalGenerator[source]

Bases: ABC

Abstract interface for a conditional generator.

device: device

device where the inference is running.

temperature: float

temperature for the sampling.

generated_length: int

maximum length of the generated molecules.

selfies_conditional_generator_params: dict

parameters for the SELFIES generator.

selfies_conditional_generator: TeacherVAE

SELFIES generator.

smiles_language: SMILESLanguage

SMILES language instance.

generator_latent_size: int
encoder_latent_size: int
get_smiles_from_latent(latent)[source]

Take samples from the latent space.

Parameters

latent (Tensor) – latent vector tensor.

Return type

List[str]

Returns

SMILES list and indexes for the valid ones.

static validate_molecules(smiles)[source]
Return type

Tuple[List[Mol], List[int]]

abstract get_latent(condition)[source]
Return type

Tensor

generate_batch(condition)[source]
Return type

List[str]

__abstractmethods__ = frozenset({'get_latent'})
__annotations__ = {'device': <class 'torch.device'>, 'encoder_latent_size': <class 'int'>, 'generated_length': <class 'int'>, 'generator_latent_size': <class 'int'>, 'selfies_conditional_generator': <class 'paccmann_chemistry.models.vae.TeacherVAE'>, 'selfies_conditional_generator_params': <class 'dict'>, 'smiles_language': <class 'pytoda.smiles.smiles_language.SMILESLanguage'>, 'temperature': <class 'float'>}
__dict__ = mappingproxy({'__module__': 'gt4sd.algorithms.conditional_generation.paccmann_rl.implementation', '__annotations__': {'device': <class 'torch.device'>, 'temperature': <class 'float'>, 'generated_length': <class 'int'>, 'selfies_conditional_generator_params': <class 'dict'>, 'selfies_conditional_generator': <class 'paccmann_chemistry.models.vae.TeacherVAE'>, 'smiles_language': <class 'pytoda.smiles.smiles_language.SMILESLanguage'>, 'generator_latent_size': <class 'int'>, 'encoder_latent_size': <class 'int'>}, '__doc__': 'Abstract interface for a conditional generator.', 'get_smiles_from_latent': <function ConditionalGenerator.get_smiles_from_latent>, 'validate_molecules': <staticmethod(<function ConditionalGenerator.validate_molecules>)>, 'get_latent': <function ConditionalGenerator.get_latent>, 'generate_batch': <function ConditionalGenerator.generate_batch>, '__dict__': <attribute '__dict__' of 'ConditionalGenerator' objects>, '__weakref__': <attribute '__weakref__' of 'ConditionalGenerator' objects>, '__abstractmethods__': frozenset({'get_latent'}), '_abc_impl': <_abc._abc_data object>})
__doc__ = 'Abstract interface for a conditional generator.'
__module__ = 'gt4sd.algorithms.conditional_generation.paccmann_rl.implementation'
__weakref__

list of weak references to the object (if defined)

_abc_impl = <_abc._abc_data object>
class ProteinSequenceConditionalGenerator(resources_path, temperature=1.4, generated_length=100, samples_per_protein=100, device=None)[source]

Bases: ConditionalGenerator

Protein conditional generator as implemented in https://doi.org/10.1088/2632-2153/abe808 (originally https://arxiv.org/abs/2005.13285). It generates highly binding and low toxic ligands.

samples_per_protein

number of points sampled per protein. It has to be greater than 1.

protein_embedding_encoder_params

parameter for the protein embedding encoder.

protein_embedding_encoder

protein embedding encoder.

__init__(resources_path, temperature=1.4, generated_length=100, samples_per_protein=100, device=None)[source]

Initialize the generator.

Parameters
  • resources_path (str) – directory where to find models and parameters.

  • temperature (float) – temperature for the sampling. Defaults to 1.4.

  • generated_length (int) – maximum length of the generated molecules. Defaults to 100.

  • samples_per_protein (int) – number of points sampled per protein. It has to be greater than 1. Defaults to 10.

  • device (Union[device, str, None]) – device where the inference is running either as a dedicated class or a string. If not provided is inferred.

get_latent(protein)[source]

Given a protein generate the latent representation.

Parameters

protein (str) – the protein used as context/condition.

Return type

Tensor

Returns

the latent representation for the given context. It contains

self.samples_per_protein repeats.

generate_batch(protein)[source]
Return type

List[str]

__abstractmethods__ = frozenset({})
__annotations__ = {'device': 'torch.device', 'encoder_latent_size': 'int', 'generated_length': 'int', 'generator_latent_size': 'int', 'selfies_conditional_generator': 'TeacherVAE', 'selfies_conditional_generator_params': 'dict', 'smiles_language': 'SMILESLanguage', 'temperature': 'float'}
__doc__ = '\n    Protein conditional generator as implemented in https://doi.org/10.1088/2632-2153/abe808\n    (originally https://arxiv.org/abs/2005.13285).\n    It generates highly binding and low toxic ligands.\n\n    Attributes:\n        samples_per_protein: number of points sampled per protein.\n            It has to be greater than 1.\n        protein_embedding_encoder_params: parameter for the protein embedding encoder.\n        protein_embedding_encoder: protein embedding encoder.\n    '
__module__ = 'gt4sd.algorithms.conditional_generation.paccmann_rl.implementation'
_abc_impl = <_abc._abc_data object>
class TranscriptomicConditionalGenerator(resources_path, temperature=1.4, generated_length=100, samples_per_profile=100, device=None)[source]

Bases: ConditionalGenerator

Transcriptomic conditional generator as implemented in https://doi.org/10.1016/j.isci.2021.102269 (originally https://doi.org/10.1007/978-3-030-45257-5_18, https://arxiv.org/abs/1909.05114). It generates highly effective small molecules against transcriptomic progiles.

samples_per_profile

number of points sampled per profile. It has to be greater than 1.

transcriptomic_encoder_params

parameter for the protein embedding encoder.

transcriptomic_encoder

protein embedding encoder.

__init__(resources_path, temperature=1.4, generated_length=100, samples_per_profile=100, device=None)[source]

Initialize the generator.

Parameters
  • resources_path (str) – directory where to find models and parameters.

  • temperature (float) – temperature for the sampling. Defaults to 1.4.

  • generated_length (int) – maximum length of the generated molecules. Defaults to 100.

  • samples_per_profile (int) – number of points sampled per protein. It has to be greater than 1. Defaults to 10.

  • device (Union[device, str, None]) – device where the inference is running either as a dedicated class or a string. If not provided is inferred.

get_latent(profile)[source]

Given a profile generate the latent representation.

Parameters

profile (Union[ndarray, Series, str]) – the profile used as context/condition.

Raises

ValueError – in case the profile has a size mismatch with the genes panel.

Return type

Tensor

Returns

the latent representation for the given context. It contains

self.samples_per_profile repeats.

generate_batch(profile)[source]
Return type

List[str]

__abstractmethods__ = frozenset({})
__annotations__ = {'device': 'torch.device', 'encoder_latent_size': 'int', 'generated_length': 'int', 'generator_latent_size': 'int', 'selfies_conditional_generator': 'TeacherVAE', 'selfies_conditional_generator_params': 'dict', 'smiles_language': 'SMILESLanguage', 'temperature': 'float'}
__doc__ = '\n    Transcriptomic conditional generator as implemented in https://doi.org/10.1016/j.isci.2021.102269\n    (originally https://doi.org/10.1007/978-3-030-45257-5_18, https://arxiv.org/abs/1909.05114).\n    It generates highly effective small molecules against transcriptomic progiles.\n\n    Attributes:\n        samples_per_profile: number of points sampled per profile.\n            It has to be greater than 1.\n        transcriptomic_encoder_params: parameter for the protein embedding encoder.\n        transcriptomic_encoder: protein embedding encoder.\n    '
__module__ = 'gt4sd.algorithms.conditional_generation.paccmann_rl.implementation'
_abc_impl = <_abc._abc_data object>