Adding a new algorithm¶
Getting started¶
The general structure of a single conditional generation algorithm in gt4sd-core
is shown here
gt4sd-core
|gt4sd
| |algorithms
| | |conditional_generation
| | | |__init__.py
| | | |[My_Algorithm]
| | | | |__init__.py
| | | | |core.py
| | | | |implementation
At the time of writing these are the only files you will need to be aware of to add your own custom algorithm to gt4sd-core
. Here we will talk through the implementation of a template algorithm we have called Template
, this algorithm will take a string input and return a list with the single item Hello
+ input, i.e. input=World
outputs the list [Hello World]
.
Since Template
is a conditional generation algorithm, I have created the My_Algorithm
folder (template
) in the conditional_generation
folder, and inside added the 3 files __init__.py
, core.py
, and implementation.py
.
Implementation¶
Starting with the file implementation.py
we have the following code
class Generator:
"""Basic Generator for the template algorithm"""
def __init__(
self,
resources_path: str,
temperature: int
):
"""Initialize the Generator.
Args:
resources_path: directory where to find models and parameters.
"""
self.resources_path = resources_path
self.temperature = temperature
def hello_name(
self,
name: str,
) -> List[str]:
"""Validate a list of strings.
Args:
name: a string.
Returns:
a list containing salutation and temperature converted to fahrenheit.
"""
return [
f"Hello {str(name)} {random.randint(1, int(1e6))} times and, fun fact, {str(self.temperature)} celsius equals to {(self.temperature * (9/5) + 32)} fahrenheit."
]
Here we have created a class called Generator
with 2 functions:
___init__(self, resources_path: str, temperature: int)
which is used to initialise the generator, set addional parameters ( in this case temperature
is the addional parameter ) and the directory from where the model is located, and
hello_name(self, name: str) -> List[str]
which is the actual implementation of the algorithm. For this guide our algorithm takes in a string name
and temperature
and outputs a single string Hello name a random number of times and temperature in fahrenheit
in a list.
For your specific algorithm this second function will be your own code.
Core¶
Now we will look into the file core.py
import logging
from typing import ClassVar, Optional, TypeVar, Callable, Iterable, Any, Dict
from ...core import AlgorithmConfiguration, GeneratorAlgorithm # type: ignore
from ...registry import ApplicationsRegistry # type: ignore
from .implementation import Generator # type: ignore
logger = logging.getLogger(__name__)
logger.addHandler(logging.NullHandler())
T = TypeVar("T")
S = TypeVar("S")
Targeted = Callable[[T], Iterable[Any]]
class Template(GeneratorAlgorithm[S, T]):
"""Template Algorithm."""
def __init__(
self, configuration: AlgorithmConfiguration[S, T], target: Optional[T] = None
):
"""Template Generation
Args:
configuration: domain and application
specification, defining types and validations.
target: Optional, in this inistance we will convert to a string.
Example:
An example for using this temmplate::
target = 'World'
configuration = TemplateGenerator()
algorithm = Template(configuration=configuration, target=target)
items = list(algorithm.sample(1))
print(items)
"""
configuration = self.validate_configuration(configuration)
# TODO there might also be a validation/check on the target input
super().__init__(
configuration=configuration,
target=target, # type:ignore
)
def get_generator(
self,
configuration: AlgorithmConfiguration[S, T],
target: Optional[T],
) -> Targeted[T]:
"""Get the function to hello_name from generator.
Args:
configuration: helps to set up the application.
target: context or condition for the generation. Just an optional string here.
Returns:
callable generating a list of 1 item containing salutation and temperature converted to fahrenheit.
"""
logger.info("ensure artifacts for the application are present.")
self.local_artifacts = configuration.ensure_artifacts()
implementation: Generator = configuration.get_conditional_generator( # type: ignore
self.local_artifacts
)
return implementation.hello_name # type:ignore
def validate_configuration(
self, configuration: AlgorithmConfiguration
) -> AlgorithmConfiguration:
# TODO raise InvalidAlgorithmConfiguration
assert isinstance(configuration, AlgorithmConfiguration)
return configuration
@ApplicationsRegistry.register_algorithm_application(Template)
class TemplateGenerator(AlgorithmConfiguration[str, str]):
"""Configuration for specific generator."""
algorithm_type: ClassVar[str] = "conditional_generation"
domain: ClassVar[str] = "materials"
algorithm_version: str = "v0"
g
temperature: int = field(
default=36,
metadata=dict(
description="Temperature parameter ( in celsius )"
),
)
def get_target_description(self) -> Dict[str, str]:
"""Get description of the target for generation.
Returns:
target description.
"""
return {
"title": "Target name",
"description": "A simple string to define the name in the output [Hello name].",
"type": "string",
}
def get_conditional_generator(self, resources_path: str) -> Generator:
return Generator(
resources_path=resources_path,
temperature=self.temperature
)
core.py
will contain at least two classes. The first is named after your algorithm, in our example this class is called Template
, which is initialised with a GeneratorAlgorithm
object. The second is an AlgorithmConfiguration
, in this case called TemplateGenerator
, which is used to configure your algorithm.
Template¶
Your algorithm, Template
for us, needs to contain at least two functions.
__init__(self, configuration, target)
This is used to initialise the algorithm by passing in the algorithm configuration and an optional parameter. The configuration parameter is the object created from the TemplateGenerator
class and the target
parameter in this case will be string we are passing through to our algorithm.
get_generator(self, configuration, target)
This function is required get the implementation from the generator configuration. It then returns the function in the implementation with corresponds with your algorithm. In our case this is implementation.hello_name
.
validate_configuration(self, configuration)
This is a optional helper function to validate that a valid configuration is provided. A similar validation method could be created to check that a user has added a valid input or target
in our case.
TemplateGenerator¶
Finally you will need to create a specific configuration for your algorithm, In our case called TemplateGenerator
, note that in our implementation we have tagged this class with @ApplicationsRegistry.register_algorithm_application(Template)
. This decorator is needed to add the algorithm to the ApplicationRegistry,
you should add a similar decorator to your implementation of AlgorithmConfiguration
replacing the Template
name in the decorator with the name of your algorithm.
In this class there are three required strings algorithm_type
, domain
, and algorithm_version
which are all self explanatory:
algorithm_type
is the type of algorithm you are implementing, i.e.,generation
.domain
is the domain your algorithm is applied to, i.e.,materials
.algorithm_version
is the version of algorithm you are on, i.e.,v0
.
These strings will set the location for resource cache of the model.
Make sure you create the appropriate path in the S3 storage used (default bucket name algorithms
, algorithms/{algorithm_type}/{algorithm_name}/{algorithm_application}/{algorithm_version}
) where your artifacts will be uploaded: algorithms/conditional_generation/Template/TemplateGenerator/v0
.
There are two required functions for our configuration:
The first function needed is
get_target_description(self) -> Dict[str, str]
which returns a dictionary defining the type of target
, for our algorithm this is a string, and both a title and description of what that target
represents. This method is needed to populate documentation for the end user.
The final function needed is
get_conditional_generator(self, resources_path: str) -> Generator
which is used to return the Generator from the resource path.
Note that if we wish to implement specific configurations for this algorithm this can also be set by creating additional AlgorithmGenerator
s in core.py
and adding each parameter via a field
object i.e.
algorithm_type: ClassVar[str] = 'conditional_generation'
domain: ClassVar[str] = 'materials'
algorithm_version: str = 'v0'
batch_size: int = field(
default=32,
metadata=dict(description="Batch size used for the generative model sampling."),
)
temperature: float = field(
default=1.4,
metadata=dict(
description="Temperature parameter for the softmax sampling in decoding."
),
)
generated_length: int = field(
default=100,
metadata=dict(
description="Maximum length in tokens of the generated molcules (relates to the SMILES length)."
),
)
def get_target_description(self) -> Dict[str, str]:
"""Get description of the target for generation.
Returns:
target description.
"""
return {
"title": "Gene expression profile",
"description": "A gene expression profile to generate effective molecules against.",
"type": "list",
}
def get_conditional_generator(
self, resources_path: str
) -> ProteinSequenceConditionalGenerator:
"""Instantiate the actual generator implementation.
Args:
resources_path: local path to model files.
Returns:
instance with :meth:`generate_batch<gt4sd.algorithms.conditional_generation.paccmann_rl.implementation.ConditionalGenerator.generate_batch>` method for targeted generation.
"""
return ProteinSequenceConditionalGenerator(
resources_path=resources_path,
temperature=self.temperature,
generated_length=self.generated_length,
samples_per_protein=self.batch_size,
)
field
is used to set a default configuration and a description of the parameter which is used to populate the documentation returned to the end user similar to get_target_description
. Algorithm configuration parameters can be validated by adding the implementation of a __post_init__
method as described here.
Final steps¶
Finally to complete our implementation we need to import all the algorithms and configurations in our created __init__.py
folder like so
from .core import (
Template,
TemplateGenerator,
)
__all__ = [
'Template',
'TemplateGenerator',
]
and to automatically add the algorithm to the registry without any manual imports, we have to import the generator class which in our case is TemplateGenerator
to the outermost __init__.py
of the subdirectory algorithms
.
from .template.core import TemplateGenerator
Using a custom algorithm¶
Now that the new algorithm is implemented we can use it the same was as shown before
Explicitly¶
from gt4sd.algorithms.conditional_generation.template import (
TemplateGenerator, Template
)
target = 'World'
configuration = TemplateGenerator()
algorithm = Template(configuration=configuration, target=target)
items = list(algorithm.sample(1))
print(items)
Registry¶
from gt4sd.algorithms.registry import ApplicationsRegistry
target = 'World'
algorithm = ApplicationsRegistry.get_application_instance(
target=target,
algorithm_type='conditional_generation',
domain='materials',
algorithm_name='Template',
algorithm_application='TemplateGenerator',
)
items = list(algorithm.sample(1))
print(items)