gt4sd.algorithms.generation.pgt.implementation module¶
Implementation details for PGT algorithms.
Summary¶
Classes:
Implementation of coherence check generator.  | 
|
Implementation of edit generator.  | 
|
Implementation of a generator.  | 
|
Implementation of edit generator.  | 
Functions:
Adjust sequence length.  | 
Reference¶
- adjust_length_to_model(length, maximum_sequence_length)[source]¶
 Adjust sequence length. :type length:
int:param length: target length. :type maximum_sequence_length:int:param maximum_sequence_length: maximum sequence length.- Returns
 the adjusted length.
- class Generator(resources_path, model_type, model_name, max_length, top_k, top_p, num_return_sequences, prompt='This is an interesting prompt', no_repeat_ngram_size=2, device=None)[source]¶
 Bases:
objectImplementation of a generator.
- __init__(resources_path, model_type, model_name, max_length, top_k, top_p, num_return_sequences, prompt='This is an interesting prompt', no_repeat_ngram_size=2, device=None)[source]¶
 PGT generation algorithm. :type resources_path:
str:param resources_path: path to the cache. :type model_type:str:param model_type: type of the model. :type model_name:str:param model_name: name of the model weights/version. :type prompt:str:param prompt: prompt for text generation. :type max_length:int:param max_length: max length of the generated text. :type top_k:int:param top_k: number of top-k probability token to keep. :type top_p:float:param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences:int:param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size:int:param no_repeat_ngram_size: size of n-gram to not appear twice. :type device:Union[device,str,None] :param device: device where the inferenceis running either as a dedicated class or a string. If not provided is inferred.
- generate_case()[source]¶
 Sample text snippets.
- Return type
 Union[List[str],List[Tuple[str, …]]]- Returns
 generated text snippets.
- format_output(input_text, generated_sequences)[source]¶
 Format output. In the general case just return the generated sequences.
- Parameters
 input_text (
Union[str,Tuple[str]]) – generation input.generated_sequences (
List[str]) – generated sequences.
- Return type
 Union[List[str],List[Tuple[str, …]]]- Returns
 formatted generated sequences.
- __dict__ = mappingproxy({'__module__': 'gt4sd.algorithms.generation.pgt.implementation', '__doc__': 'Implementation of a generator.', '__init__': <function Generator.__init__>, 'load_model': <function Generator.load_model>, 'generate_case': <function Generator.generate_case>, 'format_output': <function Generator.format_output>, '__dict__': <attribute '__dict__' of 'Generator' objects>, '__weakref__': <attribute '__weakref__' of 'Generator' objects>, '__annotations__': {}})¶
 
- __doc__ = 'Implementation of a generator.'¶
 
- __module__ = 'gt4sd.algorithms.generation.pgt.implementation'¶
 
- __weakref__¶
 list of weak references to the object (if defined)
- class PartGenerator(resources_path, input_text, model_type, model_name, task, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None)[source]¶
 Bases:
GeneratorImplementation of edit generator.
- __init__(resources_path, input_text, model_type, model_name, task, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None)[source]¶
 PGT generation algorithm. :type resources_path:
str:param resources_path: path to the cache. :type input_text:str:param input_text: input text for generation. :type task:str:param task: generation task. :type model_type:str:param model_type: type of the model. :type model_name:str:param model_name: name of the model weights/version. :type max_length:int:param max_length: max length of the generated text. :type top_k:int:param top_k: number of top-k probability token to keep. :type top_p:float:param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences:int:param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size:int:param no_repeat_ngram_size: size of n-gram to not appear twice. :type device:Union[device,str,None] :param device: device where the inferenceis running either as a dedicated class or a string. If not provided is inferred.
- __annotations__ = {}¶
 
- __doc__ = 'Implementation of edit generator.'¶
 
- __module__ = 'gt4sd.algorithms.generation.pgt.implementation'¶
 
- class EditGenerator(resources_path, input_text, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, input_type='abstract')[source]¶
 Bases:
GeneratorImplementation of edit generator.
- __init__(resources_path, input_text, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, input_type='abstract')[source]¶
 PGT generation algorithm. :type resources_path:
str:param resources_path: path to the cache. :type input_text:str:param input_text: input text for generation. :type model_type:str:param model_type: type of the model. :type model_name:str:param model_name: name of the model weights/version. :type max_length:int:param max_length: max length of the generated text. :type top_k:int:param top_k: number of top-k probability token to keep. :type top_p:float:param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences:int:param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size:int:param no_repeat_ngram_size: size of n-gram to not appear twice. :type device:Union[device,str,None] :param device: device where the inferenceis running either as a dedicated class or a string. If not provided is inferred.
- Parameters
 input_type (
str) – part of a patent the input text belongs.
- format_output(input_text, generated_sequences)[source]¶
 Format output for the patent editing task.
- Parameters
 input_text (
Union[str,Tuple[str]]) – generation input.generated_sequences (
List[str]) – generated sequences.
- Return type
 Union[List[str],List[Tuple[str, …]]]- Returns
 formatted generated sequences.
- __annotations__ = {}¶
 
- __doc__ = 'Implementation of edit generator.'¶
 
- __module__ = 'gt4sd.algorithms.generation.pgt.implementation'¶
 
- class CoherenceCheckGenerator(resources_path, input_a, input_b, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, coherence_type='title-abstract')[source]¶
 Bases:
GeneratorImplementation of coherence check generator.
- __init__(resources_path, input_a, input_b, model_type, model_name, max_length, top_k, top_p, num_return_sequences, no_repeat_ngram_size=2, device=None, coherence_type='title-abstract')[source]¶
 PGT generation algorithm. :type resources_path:
str:param resources_path: path to the cache. :type input_a:str:param input_a: first input for coherence check. :type input_b:str:param input_b: second input for coherence check. :type model_type:str:param model_type: type of the model. :type model_name:str:param model_name: name of the model weights/version. :type max_length:int:param max_length: max length of the generated text. :type top_k:int:param top_k: number of top-k probability token to keep. :type top_p:float:param top_p: only tokens with cumulative probabilities summing up to this value are kept. :type num_return_sequences:int:param num_return_sequences: number of generated sequences. :type no_repeat_ngram_size:int:param no_repeat_ngram_size: size of n-gram to not appear twice. :type device:Union[device,str,None] :param device: device where the inferenceis running either as a dedicated class or a string. If not provided is inferred.
- Parameters
 coherence_type (
str) – input types for the check.
- extract_coherence_types(coherence_type)[source]¶
 Check the validity and extract coherence types of input text.
- Parameters
 coherence_type (
str) – Input types of the coherence check.- Return type
 Tuple[str,str]- Returns
 tuple containing the type of the input.
- format_output(input_text, generated_sequences)[source]¶
 Format output for the patent coherence task.
- Parameters
 input_text (
Union[str,Tuple[str]]) – generation input.generated_sequences (
List[str]) – generated sequences.
- Return type
 Union[List[str],List[Tuple[str, …]]]- Returns
 formatted generated sequences.
- __annotations__ = {}¶
 
- __doc__ = 'Implementation of coherence check generator.'¶
 
- __module__ = 'gt4sd.algorithms.generation.pgt.implementation'¶