Generation Verifier

The GenerationVerifier computes rewards for de novo molecular generation tasks, evaluating generated molecules against property optimization criteria such as docking scores, QED, synthetic accessibility, and other molecular descriptors.

Overview

The Generation Verifier supports:

Multi-property Optimization: Optimize multiple properties simultaneously
Docking Score Computation: GPU-accelerated molecular docking with AutoDock
RDKit Descriptors: QED, SA score, LogP, molecular weight, etc.
SMILES Extraction: Robust parsing of SMILES from model completions

Supported Properties

Property	Type	Description
Docking Targets	Slow (GPU)	Binding affinity to protein pockets
Physico-Chemical Properties	Fast	QED, SA score, Molecular Weight, ...

SMILES Extraction

The verifier extracts SMILES from completions using:

Answer Tags: Content between <answer> and </answer> tags
Pattern Matching: Identifies possible valid SMILES patterns (extracted word with no characters outside SMILES charset, and that contains at least one C character, or multiple c)
Validation: Verifies molecules with RDKit

Extraction Failures

Failure Reason	Description
`no_answer`	No answer tags found
`no_smiles`	No SMILES-like strings in answer
`no_valid_smiles`	SMILES strings are invalid
`multiple_smiles`	Multiple valid SMILES found (ambiguous)

`DockingConfigModel`

Bases: BaseModel

Pydantic model for docking configuration.

This model defines the configuration parameters for docking operations, providing validation and documentation for all docking options.

Attributes:

Name	Type	Description
`exhaustiveness`	`int`	Docking exhaustiveness parameter.
`n_cpu`	`int`	Number of CPUs to use for docking.
`docking_oracle`	`Literal['pyscreener', 'autodock_gpu']`	Type of docking oracle to use ("pyscreener" or "autodock_gpu").

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier_pydantic_model.py

class DockingConfigModel(BaseModel):
    """Pydantic model for docking configuration.

    This model defines the configuration parameters for docking operations,
    providing validation and documentation for all docking options.

    Attributes:
        exhaustiveness: Docking exhaustiveness parameter.
        n_cpu: Number of CPUs to use for docking.
        docking_oracle: Type of docking oracle to use ("pyscreener" or "autodock_gpu").
    """

    exhaustiveness: int = Field(
        default=8,
        gt=1,
        description="Docking exhaustiveness parameter",
    )

    n_cpu: int = Field(
        default=8,
        gt=1,
        description="Number of CPUs to use for docking",
    )

    docking_oracle: Literal["pyscreener", "autodock_gpu"] = Field(
        default="autodock_gpu",
        description='Type of docking oracle: "pyscreener" or "autodock_gpu"',
    )

`DockingGPUConfigModel`

Bases: DockingConfigModel

Pydantic model for AutoDock GPU docking configuration.

This model defines the configuration parameters specific to the AutoDock GPU docking software, providing validation and documentation for all options.

Attributes:

Name	Type	Description
`exhaustiveness`	`int`	Docking exhaustiveness parameter.
`n_cpu`	`int`	Number of CPUs to use for docking.
`docking_oracle`	`Literal['pyscreener', 'autodock_gpu']`	Type of docking oracle to use (must be "autodock_gpu").
`vina_mode`	`str`	Command mode for AutoDock GPU.

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier_pydantic_model.py

class DockingGPUConfigModel(DockingConfigModel):
    """Pydantic model for AutoDock GPU docking configuration.

    This model defines the configuration parameters specific to the AutoDock GPU
    docking software, providing validation and documentation for all options.

    Attributes:
        exhaustiveness: Docking exhaustiveness parameter.
        n_cpu: Number of CPUs to use for docking.
        docking_oracle: Type of docking oracle to use (must be "autodock_gpu").
        vina_mode: Command mode for AutoDock GPU.
    """

    vina_mode: str = Field(
        default="autodock_gpu_256wi",
        description="Command mode for AutoDock GPU",
    )

    @model_validator(mode="after")
    def check_vina_mode(self) -> "DockingGPUConfigModel":
        assert self.docking_oracle == "autodock_gpu", (
            "vina_mode is only valid for autodock_gpu docking_oracle"
        )
        return self

`PyscreenerConfigModel`

Bases: DockingConfigModel

Pydantic model for PyScreener docking configuration.

This model defines the configuration parameters specific to the PyScreener docking software, providing validation and documentation for all options.

Attributes:

Name	Type	Description
`exhaustiveness`	`int`	Docking exhaustiveness parameter.
`n_cpu`	`int`	Number of CPUs to use for docking.
`docking_oracle`	`Literal['pyscreener', 'autodock_gpu']`	Type of docking oracle to use (must be "pyscreener").
`software_class`	`Literal['vina', 'qvina', 'smina', 'psovina', 'dock', 'dock6', 'ucsfdock']`	Docking software class to use with PyScreener.

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier_pydantic_model.py

class PyscreenerConfigModel(DockingConfigModel):
    """Pydantic model for PyScreener docking configuration.

    This model defines the configuration parameters specific to the PyScreener
    docking software, providing validation and documentation for all options.

    Attributes:
        exhaustiveness: Docking exhaustiveness parameter.
        n_cpu: Number of CPUs to use for docking.
        docking_oracle: Type of docking oracle to use (must be "pyscreener").
        software_class: Docking software class to use with PyScreener.
    """

    software_class: Literal[
        "vina",
        "qvina",
        "smina",
        "psovina",
        "dock",
        "dock6",
        "ucsfdock",
    ] = Field(
        default="vina",
        description="Docking software class to use with PyScreener",
    )

    @model_validator(mode="after")
    def check_software_class(self) -> "PyscreenerConfigModel":
        assert self.docking_oracle == "pyscreener", (
            "software_class is only valid for pyscreener docking_oracle"
        )
        return self

`GenerationVerifierInputMetadataModel`

Bases: BaseModel

Input metadata model for generation verifier.

Defines the verification criteria for molecular generation tasks, including properties to optimize, objectives for each property, and target values.

Attributes:

Name	Type	Description
`properties`	`List[str]`	List of property names to verify (e.g., "QED", "SA", "docking_target_name"). Each property should be a valid molecular descriptor or a docking target name. Must have the same length as objectives and target.
`objectives`	`List[GenerationObjT]`	List of objectives for each property. Must have the same length as properties and target. Valid values: - "maximize": Reward increases with property value - "minimize": Reward increases as property value decreases - "above": Reward is 1.0 if property >= target, 0.0 otherwise - "below": Reward is 1.0 if property <= target, 0.0 otherwise
`target`	`List[float]`	List of target values for each property. Must have the same length as properties and objectives. For "maximize"/"minimize": Used as reference point for rescaling (when enabled) For "above"/"below": Used as threshold for binary reward computation

Source code in mol_gen_docking/reward/verifiers/generation_reward/input_metadata.py

class GenerationVerifierInputMetadataModel(BaseModel):
    """Input metadata model for generation verifier.

    Defines the verification criteria for molecular generation tasks, including
    properties to optimize, objectives for each property, and target values.

    Attributes:
        properties: List of property names to verify (e.g., "QED", "SA", "docking_target_name").
            Each property should be a valid molecular descriptor or a docking target name.
            Must have the same length as objectives and target.

        objectives: List of objectives for each property.
            Must have the same length as properties and target.
            Valid values:
            - "maximize": Reward increases with property value
            - "minimize": Reward increases as property value decreases
            - "above": Reward is 1.0 if property >= target, 0.0 otherwise
            - "below": Reward is 1.0 if property <= target, 0.0 otherwise

        target: List of target values for each property.
            Must have the same length as properties and objectives.
            For "maximize"/"minimize": Used as reference point for rescaling (when enabled)
            For "above"/"below": Used as threshold for binary reward computation
    """

    properties: List[str] = Field(
        ...,
        description="List of property names to verify.",
    )
    objectives: List[GenerationObjT] = Field(
        ...,
        description="List of objectives for each property: maximize, minimize, above, or below.",
    )
    target: List[float] = Field(
        ...,
        description="List of target values for each property.",
    )

    @model_validator(mode="after")
    def validate_properties(self) -> "GenerationVerifierInputMetadataModel":
        """Validate that properties, objectives, and target have the same length."""
        if not (len(self.properties) == len(self.objectives) == len(self.target)):
            raise ValueError(
                "Length of properties, objectives, and target must be the same."
            )
        return self

`validate_properties()`

Validate that properties, objectives, and target have the same length.

Source code in mol_gen_docking/reward/verifiers/generation_reward/input_metadata.py

@model_validator(mode="after")
def validate_properties(self) -> "GenerationVerifierInputMetadataModel":
    """Validate that properties, objectives, and target have the same length."""
    if not (len(self.properties) == len(self.objectives) == len(self.target)):
        raise ValueError(
            "Length of properties, objectives, and target must be the same."
        )
    return self

`GenerationVerifierOutputModel`

Bases: VerifierOutputModel

Output model for generation verifier results.

Attributes:

Name	Type	Description
`reward`	`float`	The computed reward for the generation verification.
`parsed_answer`	`str`	The parsed answer extracted from the model completion.
`verifier_metadata`	`GenerationVerifierMetadataModel`	Metadata related to the generation verification process.

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier_pydantic_model.py

class GenerationVerifierOutputModel(VerifierOutputModel):
    """Output model for generation verifier results.

    Attributes:
        reward: The computed reward for the generation verification.
        parsed_answer: The parsed answer extracted from the model completion.
        verifier_metadata: Metadata related to the generation verification process.
    """

    reward: float = Field(
        ...,
        description="The computed reward for the generation verification.",
    )
    parsed_answer: str = Field(
        ..., description="The parsed answer extracted from the model completion."
    )
    verifier_metadata: GenerationVerifierMetadataModel = Field(
        ...,
        description="Metadata related to the generation verification process.",
    )

`GenerationVerifierMetadataModel`

Bases: BaseModel

Metadata model for generation verifier results.

Contains detailed information about the generation verification process, including all extracted SMILES, their individual rewards, and any extraction failures.

Attributes:

Name	Type	Description
`properties`	`List[str]`	List of property names that were evaluated (e.g., "docking_score", "QED", "SA"). Each property corresponds to a molecular descriptor or docking target that was optimized.
`individual_rewards`	`List[float]`	List of individual rewards for each property in the properties list. Each value is typically in [0.0, 1.0] range when rescaling is enabled, representing how well the molecule satisfies each property objective.
`all_smi_rewards`	`List[float]`	List of rewards for all SMILES found in the completion. When multiple SMILES are extracted, each gets its own reward. The final reward is typically the best among these values.
`all_smi`	`List[str]`	List of all SMILES strings extracted from the completion. May contain multiple SMILES if the model generated several molecules. Empty if SMILES extraction failed.
`smiles_extraction_failure`	`str`	Error message if SMILES extraction failed. Empty string if extraction was successful. Common values include: "no_smiles": No valid SMILES found in the completion "multiple_smiles_in_boxed": Multiple SMILES found when only one was expected "invalid_smiles": SMILES string found but not chemically valid

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier_pydantic_model.py

class GenerationVerifierMetadataModel(BaseModel):
    """Metadata model for generation verifier results.

    Contains detailed information about the generation verification process,
    including all extracted SMILES, their individual rewards, and any extraction failures.

    Attributes:
        properties: List of property names that were evaluated (e.g., "docking_score", "QED", "SA").
            Each property corresponds to a molecular descriptor or docking target that was optimized.

        individual_rewards: List of individual rewards for each property in the properties list.
            Each value is typically in [0.0, 1.0] range when rescaling is enabled, representing
            how well the molecule satisfies each property objective.

        all_smi_rewards: List of rewards for all SMILES found in the completion.
            When multiple SMILES are extracted, each gets its own reward. The final reward
            is typically the best among these values.

        all_smi: List of all SMILES strings extracted from the completion.
            May contain multiple SMILES if the model generated several molecules.
            Empty if SMILES extraction failed.

        smiles_extraction_failure: Error message if SMILES extraction failed.
            Empty string if extraction was successful. Common values include:

            - "no_smiles": No valid SMILES found in the completion
            - "multiple_smiles_in_boxed": Multiple SMILES found when only one was expected
            - "invalid_smiles": SMILES string found but not chemically valid
    """

    properties: List[str] = Field(
        default_factory=list,
        description="List of property names that were evaluated.",
    )
    individual_rewards: List[float] = Field(
        default_factory=list,
        description="List of individual rewards for each property.",
    )
    all_smi_rewards: List[float] = Field(
        default_factory=list,
        description="List of rewards for all SMILES in the completion.",
    )
    all_smi: List[str] = Field(
        default_factory=list,
        description="List of all SMILES strings in the completion.",
    )
    smiles_extraction_failure: str = Field(
        default="",
        description="Error message if there was a failure in extracting SMILES from the completion.",
        frozen=False,
    )

`GenerationVerifierConfigModel`

Bases: BaseModel

Pydantic model for generation verifier configuration.

This model defines the configuration parameters for the GenerationVerifier class, providing validation and documentation for all configuration options.

Attributes:

Name	Type	Description
`path_to_mappings`	`str`	Optional path to property mappings and docking targets configuration directory. Should contain 'names_mapping.json' and 'docking_targets.json' files.
`reward`	`Literal['property', 'valid_smiles']`	Type of reward to compute. Either "property" for property-based rewards or "valid_smiles" for validity-based rewards.
`rescale`	`bool`	Whether to rescale the rewards to a normalized range.
`oracle_kwargs`	`DockingGPUConfigModel \| PyscreenerConfigModel`	Dictionary of keyword arguments to pass to the docking oracle. Can include: `- exhaustiveness: Docking exhaustiveness parameter - n_cpu: Number of CPUs for docking - docking_oracle: Type of docking oracle ("pyscreener" or "autodock_gpu") - vina_mode: Command mode for AutoDock GPU`
`docking_concurrency_per_gpu`	`int`	Number of concurrent docking runs to allow per GPU. Default is 2 (uses ~1GB per run on 80GB GPU).

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier_pydantic_model.py

class GenerationVerifierConfigModel(BaseModel):
    """Pydantic model for generation verifier configuration.

    This model defines the configuration parameters for the GenerationVerifier class,
    providing validation and documentation for all configuration options.

    Attributes:
        path_to_mappings: Optional path to property mappings and docking targets configuration directory.
                         Should contain 'names_mapping.json' and 'docking_targets.json' files.
        reward: Type of reward to compute. Either "property" for property-based rewards or "valid_smiles"
                for validity-based rewards.
        rescale: Whether to rescale the rewards to a normalized range.
        oracle_kwargs: Dictionary of keyword arguments to pass to the docking oracle. Can include:

                       - exhaustiveness: Docking exhaustiveness parameter
                       - n_cpu: Number of CPUs for docking
                       - docking_oracle: Type of docking oracle ("pyscreener" or "autodock_gpu")
                       - vina_mode: Command mode for AutoDock GPU
        docking_concurrency_per_gpu: Number of concurrent docking runs to allow per GPU.
                                     Default is 2 (uses ~1GB per run on 80GB GPU).
    """

    path_to_mappings: str = Field(
        description="Path to property mappings and docking targets configuration directory (must contain names_mapping.json and docking_targets.json)",
    )

    reward: Literal["property", "valid_smiles"] = Field(
        default="property",
        description='Reward type: "property" for property-based or "valid_smiles" for validity-based rewards',
    )

    rescale: bool = Field(
        default=True,
        description="Whether to rescale rewards to a normalized range",
    )

    oracle_kwargs: DockingGPUConfigModel | PyscreenerConfigModel = Field(
        default_factory=DockingGPUConfigModel,
        description="Keyword arguments for the docking oracle (exhaustiveness, n_cpu, docking_oracle, vina_mode, etc.)",
    )

    docking_concurrency_per_gpu: int = Field(
        default=2,
        gt=0,
        description="Number of concurrent docking runs per GPU (each uses ~1GB on 80GB GPU)",
    )

    parsing_method: Literal["none", "answer_tags", "boxed"] = Field(
        default="answer_tags",
        description="Method to parse model completions for SMILES or property values.",
    )

    class Config:
        """Pydantic configuration."""

        arbitrary_types_allowed = True
        json_schema_extra = {
            "example": {
                "path_to_mappings": "data/molgendata",
                "reward": "property",
                "rescale": True,
                "oracle_kwargs": {
                    "exhaustiveness": 8,
                    "n_cpu": 8,
                    "docking_oracle": "autodock_gpu",
                    "vina_mode": "autodock_gpu_256wi",
                },
                "docking_concurrency_per_gpu": 2,
            }
        }

    @model_validator(mode="after")
    def check_mappings_path(self) -> "GenerationVerifierConfigModel":
        """Validate that the path_to_mappings exists and contains required files."""
        if self.path_to_mappings is not None:
            if not os.path.exists(self.path_to_mappings):
                raise ValueError(
                    f"Path to mappings {self.path_to_mappings} does not exist."
                )
            names_mapping_path = os.path.join(
                self.path_to_mappings, "names_mapping.json"
            )
            docking_targets_path = os.path.join(
                self.path_to_mappings, "docking_targets.json"
            )
            if not os.path.exists(names_mapping_path):
                raise ValueError(
                    f"names_mapping.json not found at {names_mapping_path}"
                )
            if not os.path.exists(docking_targets_path):
                raise ValueError(
                    f"docking_targets.json not found at {docking_targets_path}"
                )
        return self

`Config`

Pydantic configuration.

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier_pydantic_model.py

class Config:
    """Pydantic configuration."""

    arbitrary_types_allowed = True
    json_schema_extra = {
        "example": {
            "path_to_mappings": "data/molgendata",
            "reward": "property",
            "rescale": True,
            "oracle_kwargs": {
                "exhaustiveness": 8,
                "n_cpu": 8,
                "docking_oracle": "autodock_gpu",
                "vina_mode": "autodock_gpu_256wi",
            },
            "docking_concurrency_per_gpu": 2,
        }
    }

`check_mappings_path()`

Validate that the path_to_mappings exists and contains required files.

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier_pydantic_model.py

@model_validator(mode="after")
def check_mappings_path(self) -> "GenerationVerifierConfigModel":
    """Validate that the path_to_mappings exists and contains required files."""
    if self.path_to_mappings is not None:
        if not os.path.exists(self.path_to_mappings):
            raise ValueError(
                f"Path to mappings {self.path_to_mappings} does not exist."
            )
        names_mapping_path = os.path.join(
            self.path_to_mappings, "names_mapping.json"
        )
        docking_targets_path = os.path.join(
            self.path_to_mappings, "docking_targets.json"
        )
        if not os.path.exists(names_mapping_path):
            raise ValueError(
                f"names_mapping.json not found at {names_mapping_path}"
            )
        if not os.path.exists(docking_targets_path):
            raise ValueError(
                f"docking_targets.json not found at {docking_targets_path}"
            )
    return self

Generation verifier for de novo molecular generation tasks.

This module provides the GenerationVerifier class which computes rewards for molecular generation based on property optimization objectives such as docking scores, QED, synthetic accessibility, and other molecular descriptors.

`GenerationVerifier`

Bases: Verifier

Verifier for de novo molecular generation tasks.

This verifier computes rewards for generated molecules based on how well they meet specified property optimization criteria. It supports multiple property types including docking scores, QED, SA score, and RDKit descriptors.

The verifier uses Ray for parallel computation and supports GPU-accelerated docking calculations when configured with AutoDock GPU.

Attributes:

Name	Type	Description
`verifier_config`	`GenerationVerifierConfigModel`	Configuration for the generation verifier.
`property_name_mapping`		Mapping of property names to oracle names.
`docking_target_list`		List of valid docking target names.
`oracles`	`Dict[str, OracleWrapper]`	Cache of oracle instances for property computation.
`debug`		If True, enables debug mode with additional logging.

Example

from mol_gen_docking.reward.verifiers import (
    GenerationVerifier,
    GenerationVerifierConfigModel,
    BatchVerifiersInputModel,
    GenerationVerifierInputMetadataModel
)

config = GenerationVerifierConfigModel(
    path_to_mappings="data/molgendata",
    reward="property"
)
verifier = GenerationVerifier(config)

inputs = BatchVerifiersInputModel(
    completions=["<answer>CCO</answer>"],
    metadatas=[GenerationVerifierInputMetadataModel(
        properties=["QED"], objectives=["maximize"], target=[0.0]
    )]
)
results = verifier.get_score(inputs)

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier.py

class GenerationVerifier(Verifier):
    """Verifier for de novo molecular generation tasks.

    This verifier computes rewards for generated molecules based on how well
    they meet specified property optimization criteria. It supports multiple
    property types including docking scores, QED, SA score, and RDKit descriptors.

    The verifier uses Ray for parallel computation and supports GPU-accelerated
    docking calculations when configured with AutoDock GPU.

    Attributes:
        verifier_config: Configuration for the generation verifier.
        property_name_mapping: Mapping of property names to oracle names.
        docking_target_list: List of valid docking target names.
        oracles: Cache of oracle instances for property computation.
        debug: If True, enables debug mode with additional logging.

    Example:
        ```python
        from mol_gen_docking.reward.verifiers import (
            GenerationVerifier,
            GenerationVerifierConfigModel,
            BatchVerifiersInputModel,
            GenerationVerifierInputMetadataModel
        )

        config = GenerationVerifierConfigModel(
            path_to_mappings="data/molgendata",
            reward="property"
        )
        verifier = GenerationVerifier(config)

        inputs = BatchVerifiersInputModel(
            completions=["<answer>CCO</answer>"],
            metadatas=[GenerationVerifierInputMetadataModel(
                properties=["QED"], objectives=["maximize"], target=[0.0]
            )]
        )
        results = verifier.get_score(inputs)
        ```
    """

    def __init__(
        self,
        verifier_config: GenerationVerifierConfigModel,
    ):
        """Initialize the GenerationVerifier.

        Args:
            verifier_config: Configuration containing paths to mappings,
                reward type, and docking oracle settings.
        """
        super().__init__(verifier_config)
        self.verifier_config: GenerationVerifierConfigModel = verifier_config
        self.logger = logging.getLogger("GenerationVerifier")

        with open(
            os.path.join(verifier_config.path_to_mappings, "names_mapping.json")
        ) as f:
            property_name_mapping = json.load(f)
        with open(
            os.path.join(verifier_config.path_to_mappings, "docking_targets.json")
        ) as f:
            docking_target_list = json.load(f)

        self.property_name_mapping = property_name_mapping
        self.docking_target_list = docking_target_list
        self.slow_props = docking_target_list  # + ["GSK3B", "JNK3", "DRD2"]

        self.oracles: Dict[str, OracleWrapper] = {}
        self.debug = False  # Only for tests

    def get_smiles_from_completion(self, comp: str) -> Tuple[List[str], str]:
        """Extract SMILES strings from a model completion.

        This method parses a model completion to extract valid SMILES strings.
        It handles various formats including answer tags and markdown formatting.

        Args:
            comp: The model completion string to parse.

        Returns:
            Tuple containing:
                - List of valid SMILES strings found in the completion
                - Failure reason string (empty if successful, otherwise one of:
                  "no_answer", "no_smiles", "no_valid_smiles", "multiple_smiles")

        Example:
            ```python
            smiles, failure = verifier.get_smiles_from_completion("<answer>CCO</answer>")
            # smiles = ["CCO"], failure = ""
            ```
        """
        comp = comp.strip()
        reason: str = ""
        comp = self.parse_answer(comp)

        # Now we identify which elements are possibly SMILES
        # First we split the completion by newlines and spaces
        # Then we filter by removing any string that does not contain "C"
        valid_smiles_pattern = re.compile(r"^[A-Za-z0-9=#:\+\-\[\]\(\)/\\@.%]+$")
        mkd_pattern = re.compile(r"^(\*\*|[-*'])(.+)\1$")

        def filter_smiles(x: str) -> str:
            x = x.replace("<|im_end|>", "")
            if len(x) < 3:
                return ""
            # Check if the string is encapsulated in some kind of markdown
            m = mkd_pattern.match(x)
            x = m.group(2) if m else x
            if len(x) < 3:
                return ""
            if (
                "C" in x
                or x.count("c") > 2
                and valid_smiles_pattern.fullmatch(x) is not None
            ):
                return x
            return ""

        # Finally we remove any string that is not a valid SMILES
        def test_is_valid_batch(smis: list[str]) -> list[bool]:
            RDLogger.DisableLog("rdApp.*")
            results = []
            for smi in smis:
                if len(smi) >= 130:
                    results.append(False)
                    continue
                try:
                    mol = Chem.MolFromSmiles(smi)
                    if mol is None:
                        results.append(False)
                        continue
                    if has_bridged_bond(mol):  ### WE REMOVE BRIDGED MOLS
                        results.append(False)
                        continue
                    Chem.MolToMolBlock(mol)
                    results.append(True)
                except Exception:
                    results.append(False)
            return results

        s_poss = [filter_smiles(x) for x in re.split("\n| |\\.|\t|:|`|'|,", comp)]
        s_poss = [x for x in s_poss if x != ""]
        s_poss = list(set(s_poss))

        if len(s_poss) == 0:
            if reason == "":
                reason = "no_smiles"
            return [], reason

        is_valid: List[bool] = test_is_valid_batch(s_poss)

        s_spl = [x for (x, val) in zip(s_poss, is_valid) if val]
        if s_spl == [] and reason == "":
            reason = "no_valid_smiles"
        elif len(s_spl) > 1:
            reason = "multiple_smiles"
        elif reason == "":
            reason = ""
        return s_spl, reason

    def get_all_completions_smiles(
        self, completions: List[str]
    ) -> Tuple[List[List[str]], List[str]]:
        """Extract SMILES from multiple completions.

        Args:
            completions: List of model completion strings.

        Returns:
            Tuple containing:
                - List of SMILES lists (one per completion)
                - List of failure reasons (one per completion)
        """
        smiles = []
        failures = []
        for completion in completions:
            if isinstance(completion, list):
                assert len(completion) == 1
                completion = completion[0]
            if isinstance(completion, dict):
                assert "content" in completion
                completion = completion["content"]
            smi, failure = self.get_smiles_from_completion(completion)
            smiles.append(smi)
            failures.append(failure)
        return smiles, failures

    def fill_df_properties(self, df_properties: pd.DataFrame) -> None:
        """Compute property values for all molecules in a DataFrame.

        This method fills in the 'value' column of the DataFrame with computed
        property values using the appropriate oracles. It uses Ray for parallel
        computation, with GPU resources allocated for docking calculations.

        Args:
            df_properties: DataFrame with columns ['smiles', 'property', 'value',
                'obj', 'target_value', 'id_completion']. The 'value' column will
                be filled with computed property values.
        """

        def _get_property(
            smiles: List[str],
            prop: str,
            rescale: bool = True,
            kwargs: Dict[str, Any] = {},
        ) -> List[float]:
            """
            Get property reward
            """
            oracle_fn = self.oracles.get(
                prop,
                get_oracle(
                    prop,
                    path_to_data=self.verifier_config.path_to_mappings
                    if self.verifier_config.path_to_mappings
                    else "",
                    docking_target_list=self.docking_target_list,
                    property_name_mapping=self.property_name_mapping,
                    **kwargs,
                ),
            )
            if prop not in self.oracles:
                self.oracles[prop] = oracle_fn
            property_reward: np.ndarray | float = oracle_fn(smiles, rescale=rescale)
            assert isinstance(property_reward, np.ndarray)

            return [float(p) for p in property_reward]

        _get_property_fast = ray.remote(num_cpus=0)(_get_property)
        _get_property_long = ray.remote(
            num_cpus=1,
            num_gpus=float("gpu" in self.verifier_config.oracle_kwargs.docking_oracle)
            / self.verifier_config.docking_concurrency_per_gpu,
        )(_get_property)

        all_properties = df_properties["property"].unique().tolist()
        prop_smiles = {
            p: df_properties[df_properties["property"] == p]["smiles"].unique().tolist()
            for p in all_properties
        }

        values_job = []
        for p in all_properties:
            # If the reward is long to compute, use ray
            smiles = prop_smiles[p]
            if p in self.slow_props:
                _get_property_remote = _get_property_long
            else:
                _get_property_remote = _get_property_fast

            values_job.append(
                _get_property_remote.remote(
                    smiles,
                    p,
                    rescale=self.verifier_config.rescale,
                    kwargs=self.verifier_config.oracle_kwargs.model_dump(),
                )
            )
        all_values = ray.get(values_job)
        for idx_p, p in enumerate(all_properties):
            values = all_values[idx_p]
            smiles = prop_smiles[p]
            for s, v in zip(smiles, values):
                df_properties.loc[
                    (df_properties["smiles"] == s) & (df_properties["property"] == p),
                    "value",
                ] = v

    def get_reward(self, row: pd.Series) -> float:
        """Compute reward for a single property-molecule pair.

        This method computes the reward based on the objective type:
        - "below": 1.0 if property <= target, else 0.0
        - "above": 1.0 if property >= target, else 0.0
        - "maximize": Returns the property value directly
        - "minimize": Returns 1 - property value
        - "equal": Returns clipped value based on squared error

        Args:
            row: DataFrame row containing 'obj', 'value', 'target_value', 'property'.

        Returns:
            Computed reward value (typically 0.0 to 1.0).
        """
        reward: float = 0
        obj = row["obj"]
        mol_prop = row["value"]
        target_value = row["target_value"]
        prop = row["property"]
        is_docking = prop in self.docking_target_list
        # Replace 0 docking score by the worst outcome
        if is_docking and prop == 0.0:
            return 0.0
        if self.verifier_config.rescale:
            target_value = rescale_property_values(
                prop, target_value, docking=is_docking
            )
        if obj == "below":
            reward += float(mol_prop <= target_value)
        elif obj == "above":
            reward += float(mol_prop >= target_value)
        elif obj == "maximize":
            reward += mol_prop
        elif obj == "minimize":
            reward += 1 - mol_prop
        elif obj == "equal":
            reward += np.clip(1 - 100 * (mol_prop - target_value) ** 2, 0, 1)
        return float(reward)

    def _get_prop_to_smiles_dataframe(
        self,
        smiles_list_per_completion: List[List[str]],
        objectives: List[dict[str, Tuple[GenerationObjT, float]]],
    ) -> pd.DataFrame:
        """Create a DataFrame mapping properties to SMILES for batch processing.

        Args:
            smiles_list_per_completion: List of SMILES lists, one per completion.
            objectives: List of objective dictionaries mapping property names
                to (objective_type, target_value) tuples.

        Returns:
            DataFrame with columns: smiles, property, value, obj, target_value, id_completion.
        """
        df_properties = pd.DataFrame(
            [
                (s, p, None, obj, target_value, i)
                for i, (props, smiles_list) in enumerate(
                    zip(objectives, smiles_list_per_completion)
                )
                for s in smiles_list
                for p, (obj, target_value) in props.items()
            ],
            columns=[
                "smiles",
                "property",
                "value",
                "obj",
                "target_value",
                "id_completion",
            ],
        )
        return df_properties

    def get_score(
        self, inputs: BatchVerifiersInputModel
    ) -> List[GenerationVerifierOutputModel]:
        """Compute generation rewards for a batch of completions.

        This method extracts SMILES from completions, computes property values,
        and calculates rewards based on the specified objectives. The final reward
        is the geometric mean of per-property rewards.

        Args:
            inputs: Batch of completions and metadata for verification.

        Returns:
            List of GenerationVerifierOutputModel containing rewards and metadata
            for each completion.

        Notes:
            - If reward type is "valid_smiles", returns 1.0 for valid single SMILES
            - Multiple SMILES in a completion result in 0.0 reward
            - Uses geometric mean to aggregate multi-property rewards
        """
        smiles_per_completion, extraction_failures = self.get_all_completions_smiles(
            inputs.completions
        )
        if self.verifier_config.reward == "valid_smiles":
            return [
                GenerationVerifierOutputModel(
                    reward=float(len(smis) == 1),
                    parsed_answer=self.parse_answer("; ".join(smis)),
                    verifier_metadata=GenerationVerifierMetadataModel(
                        smiles_extraction_failure=fail
                    ),
                )
                for smis, fail in zip(smiles_per_completion, extraction_failures)
            ]
        assert all(
            isinstance(meta, GenerationVerifierInputMetadataModel)
            for meta in inputs.metadatas
        )
        metadatas: List[GenerationVerifierInputMetadataModel] = inputs.metadatas  # type: ignore

        objectives = []
        for m in metadatas:
            props = {}
            for p, obj, target in zip(m.properties, m.objectives, m.target):
                props[p] = (obj, float(target))
            objectives.append(props)

        df_properties = self._get_prop_to_smiles_dataframe(
            smiles_per_completion, objectives
        )
        self.fill_df_properties(df_properties)
        df_properties["reward"] = df_properties.apply(
            lambda x: self.get_reward(x), axis=1
        )

        output_models = []
        for id_completion, smiles in enumerate(smiles_per_completion):
            properties: List[str] = []
            individual_rewards: List[float] = []
            compl_reward: List[float] = []
            if len(smiles) > 0:
                for idx_s, s in enumerate(smiles):
                    rows_completion = df_properties[
                        (df_properties["id_completion"] == id_completion)
                        & (df_properties["smiles"] == s)
                    ]
                    rewards_l = rows_completion["reward"].to_numpy().clip(0, 1)
                    reward = np.power(
                        rewards_l.prod(), (1 / len(rewards_l))
                    )  # Geometric mean
                    if idx_s == 0:
                        for i in range(len(rows_completion["smiles"])):
                            properties.append(rows_completion["property"].iloc[i])
                            individual_rewards.append(rows_completion["reward"].iloc[i])

                    if self.verifier_config.rescale and not self.debug:
                        reward = np.clip(reward, 0, 1)
                    compl_reward.append(float(reward))
            else:
                reward = 0
                compl_reward = [0.0]

            if np.isnan(reward) or reward is None:
                self.logger.warning(
                    f"Warning: Reward is None or NaN for completion id {id_completion} with smiles {smiles}\n"
                )
                reward = 0.0
            if len(smiles) > 1:
                reward = 0.0

            # Create the output model
            output_model = GenerationVerifierOutputModel(
                reward=float(reward),
                parsed_answer=self.parse_answer("; ".join(smiles)),
                verifier_metadata=GenerationVerifierMetadataModel(
                    properties=properties,
                    individual_rewards=individual_rewards,
                    all_smi_rewards=compl_reward,
                    all_smi=smiles,
                    smiles_extraction_failure=extraction_failures[id_completion],
                ),
            )
            output_models.append(output_model)

        return output_models

`init(verifier_config)`

Initialize the GenerationVerifier.

Parameters:

Name	Type	Description	Default
`verifier_config`	`GenerationVerifierConfigModel`	Configuration containing paths to mappings, reward type, and docking oracle settings.	required

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier.py

def __init__(
    self,
    verifier_config: GenerationVerifierConfigModel,
):
    """Initialize the GenerationVerifier.

    Args:
        verifier_config: Configuration containing paths to mappings,
            reward type, and docking oracle settings.
    """
    super().__init__(verifier_config)
    self.verifier_config: GenerationVerifierConfigModel = verifier_config
    self.logger = logging.getLogger("GenerationVerifier")

    with open(
        os.path.join(verifier_config.path_to_mappings, "names_mapping.json")
    ) as f:
        property_name_mapping = json.load(f)
    with open(
        os.path.join(verifier_config.path_to_mappings, "docking_targets.json")
    ) as f:
        docking_target_list = json.load(f)

    self.property_name_mapping = property_name_mapping
    self.docking_target_list = docking_target_list
    self.slow_props = docking_target_list  # + ["GSK3B", "JNK3", "DRD2"]

    self.oracles: Dict[str, OracleWrapper] = {}
    self.debug = False  # Only for tests

`fill_df_properties(df_properties)`

Compute property values for all molecules in a DataFrame.

This method fills in the 'value' column of the DataFrame with computed property values using the appropriate oracles. It uses Ray for parallel computation, with GPU resources allocated for docking calculations.

Parameters:

Name	Type	Description	Default
`df_properties`	`DataFrame`	DataFrame with columns ['smiles', 'property', 'value', 'obj', 'target_value', 'id_completion']. The 'value' column will be filled with computed property values.	required

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier.py

def fill_df_properties(self, df_properties: pd.DataFrame) -> None:
    """Compute property values for all molecules in a DataFrame.

    This method fills in the 'value' column of the DataFrame with computed
    property values using the appropriate oracles. It uses Ray for parallel
    computation, with GPU resources allocated for docking calculations.

    Args:
        df_properties: DataFrame with columns ['smiles', 'property', 'value',
            'obj', 'target_value', 'id_completion']. The 'value' column will
            be filled with computed property values.
    """

    def _get_property(
        smiles: List[str],
        prop: str,
        rescale: bool = True,
        kwargs: Dict[str, Any] = {},
    ) -> List[float]:
        """
        Get property reward
        """
        oracle_fn = self.oracles.get(
            prop,
            get_oracle(
                prop,
                path_to_data=self.verifier_config.path_to_mappings
                if self.verifier_config.path_to_mappings
                else "",
                docking_target_list=self.docking_target_list,
                property_name_mapping=self.property_name_mapping,
                **kwargs,
            ),
        )
        if prop not in self.oracles:
            self.oracles[prop] = oracle_fn
        property_reward: np.ndarray | float = oracle_fn(smiles, rescale=rescale)
        assert isinstance(property_reward, np.ndarray)

        return [float(p) for p in property_reward]

    _get_property_fast = ray.remote(num_cpus=0)(_get_property)
    _get_property_long = ray.remote(
        num_cpus=1,
        num_gpus=float("gpu" in self.verifier_config.oracle_kwargs.docking_oracle)
        / self.verifier_config.docking_concurrency_per_gpu,
    )(_get_property)

    all_properties = df_properties["property"].unique().tolist()
    prop_smiles = {
        p: df_properties[df_properties["property"] == p]["smiles"].unique().tolist()
        for p in all_properties
    }

    values_job = []
    for p in all_properties:
        # If the reward is long to compute, use ray
        smiles = prop_smiles[p]
        if p in self.slow_props:
            _get_property_remote = _get_property_long
        else:
            _get_property_remote = _get_property_fast

        values_job.append(
            _get_property_remote.remote(
                smiles,
                p,
                rescale=self.verifier_config.rescale,
                kwargs=self.verifier_config.oracle_kwargs.model_dump(),
            )
        )
    all_values = ray.get(values_job)
    for idx_p, p in enumerate(all_properties):
        values = all_values[idx_p]
        smiles = prop_smiles[p]
        for s, v in zip(smiles, values):
            df_properties.loc[
                (df_properties["smiles"] == s) & (df_properties["property"] == p),
                "value",
            ] = v

`get_all_completions_smiles(completions)`

Extract SMILES from multiple completions.

Parameters:

Name	Type	Description	Default
`completions`	`List[str]`	List of model completion strings.	required

Returns:

Type	Description
`Tuple[List[List[str]], List[str]]`	Tuple containing: - List of SMILES lists (one per completion) - List of failure reasons (one per completion)

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier.py

def get_all_completions_smiles(
    self, completions: List[str]
) -> Tuple[List[List[str]], List[str]]:
    """Extract SMILES from multiple completions.

    Args:
        completions: List of model completion strings.

    Returns:
        Tuple containing:
            - List of SMILES lists (one per completion)
            - List of failure reasons (one per completion)
    """
    smiles = []
    failures = []
    for completion in completions:
        if isinstance(completion, list):
            assert len(completion) == 1
            completion = completion[0]
        if isinstance(completion, dict):
            assert "content" in completion
            completion = completion["content"]
        smi, failure = self.get_smiles_from_completion(completion)
        smiles.append(smi)
        failures.append(failure)
    return smiles, failures

`get_reward(row)`

Compute reward for a single property-molecule pair.

This method computes the reward based on the objective type: - "below": 1.0 if property <= target, else 0.0 - "above": 1.0 if property >= target, else 0.0 - "maximize": Returns the property value directly - "minimize": Returns 1 - property value - "equal": Returns clipped value based on squared error

Parameters:

Name	Type	Description	Default
`row`	`Series`	DataFrame row containing 'obj', 'value', 'target_value', 'property'.	required

Returns:

Type	Description
`float`	Computed reward value (typically 0.0 to 1.0).

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier.py

def get_reward(self, row: pd.Series) -> float:
    """Compute reward for a single property-molecule pair.

    This method computes the reward based on the objective type:
    - "below": 1.0 if property <= target, else 0.0
    - "above": 1.0 if property >= target, else 0.0
    - "maximize": Returns the property value directly
    - "minimize": Returns 1 - property value
    - "equal": Returns clipped value based on squared error

    Args:
        row: DataFrame row containing 'obj', 'value', 'target_value', 'property'.

    Returns:
        Computed reward value (typically 0.0 to 1.0).
    """
    reward: float = 0
    obj = row["obj"]
    mol_prop = row["value"]
    target_value = row["target_value"]
    prop = row["property"]
    is_docking = prop in self.docking_target_list
    # Replace 0 docking score by the worst outcome
    if is_docking and prop == 0.0:
        return 0.0
    if self.verifier_config.rescale:
        target_value = rescale_property_values(
            prop, target_value, docking=is_docking
        )
    if obj == "below":
        reward += float(mol_prop <= target_value)
    elif obj == "above":
        reward += float(mol_prop >= target_value)
    elif obj == "maximize":
        reward += mol_prop
    elif obj == "minimize":
        reward += 1 - mol_prop
    elif obj == "equal":
        reward += np.clip(1 - 100 * (mol_prop - target_value) ** 2, 0, 1)
    return float(reward)

`get_score(inputs)`

Compute generation rewards for a batch of completions.

This method extracts SMILES from completions, computes property values, and calculates rewards based on the specified objectives. The final reward is the geometric mean of per-property rewards.

Parameters:

Name	Type	Description	Default
`inputs`	`BatchVerifiersInputModel`	Batch of completions and metadata for verification.	required

Returns:

Type	Description
`List[GenerationVerifierOutputModel]`	List of GenerationVerifierOutputModel containing rewards and metadata
`List[GenerationVerifierOutputModel]`	for each completion.

Notes

If reward type is "valid_smiles", returns 1.0 for valid single SMILES
Multiple SMILES in a completion result in 0.0 reward
Uses geometric mean to aggregate multi-property rewards

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier.py

def get_score(
    self, inputs: BatchVerifiersInputModel
) -> List[GenerationVerifierOutputModel]:
    """Compute generation rewards for a batch of completions.

    This method extracts SMILES from completions, computes property values,
    and calculates rewards based on the specified objectives. The final reward
    is the geometric mean of per-property rewards.

    Args:
        inputs: Batch of completions and metadata for verification.

    Returns:
        List of GenerationVerifierOutputModel containing rewards and metadata
        for each completion.

    Notes:
        - If reward type is "valid_smiles", returns 1.0 for valid single SMILES
        - Multiple SMILES in a completion result in 0.0 reward
        - Uses geometric mean to aggregate multi-property rewards
    """
    smiles_per_completion, extraction_failures = self.get_all_completions_smiles(
        inputs.completions
    )
    if self.verifier_config.reward == "valid_smiles":
        return [
            GenerationVerifierOutputModel(
                reward=float(len(smis) == 1),
                parsed_answer=self.parse_answer("; ".join(smis)),
                verifier_metadata=GenerationVerifierMetadataModel(
                    smiles_extraction_failure=fail
                ),
            )
            for smis, fail in zip(smiles_per_completion, extraction_failures)
        ]
    assert all(
        isinstance(meta, GenerationVerifierInputMetadataModel)
        for meta in inputs.metadatas
    )
    metadatas: List[GenerationVerifierInputMetadataModel] = inputs.metadatas  # type: ignore

    objectives = []
    for m in metadatas:
        props = {}
        for p, obj, target in zip(m.properties, m.objectives, m.target):
            props[p] = (obj, float(target))
        objectives.append(props)

    df_properties = self._get_prop_to_smiles_dataframe(
        smiles_per_completion, objectives
    )
    self.fill_df_properties(df_properties)
    df_properties["reward"] = df_properties.apply(
        lambda x: self.get_reward(x), axis=1
    )

    output_models = []
    for id_completion, smiles in enumerate(smiles_per_completion):
        properties: List[str] = []
        individual_rewards: List[float] = []
        compl_reward: List[float] = []
        if len(smiles) > 0:
            for idx_s, s in enumerate(smiles):
                rows_completion = df_properties[
                    (df_properties["id_completion"] == id_completion)
                    & (df_properties["smiles"] == s)
                ]
                rewards_l = rows_completion["reward"].to_numpy().clip(0, 1)
                reward = np.power(
                    rewards_l.prod(), (1 / len(rewards_l))
                )  # Geometric mean
                if idx_s == 0:
                    for i in range(len(rows_completion["smiles"])):
                        properties.append(rows_completion["property"].iloc[i])
                        individual_rewards.append(rows_completion["reward"].iloc[i])

                if self.verifier_config.rescale and not self.debug:
                    reward = np.clip(reward, 0, 1)
                compl_reward.append(float(reward))
        else:
            reward = 0
            compl_reward = [0.0]

        if np.isnan(reward) or reward is None:
            self.logger.warning(
                f"Warning: Reward is None or NaN for completion id {id_completion} with smiles {smiles}\n"
            )
            reward = 0.0
        if len(smiles) > 1:
            reward = 0.0

        # Create the output model
        output_model = GenerationVerifierOutputModel(
            reward=float(reward),
            parsed_answer=self.parse_answer("; ".join(smiles)),
            verifier_metadata=GenerationVerifierMetadataModel(
                properties=properties,
                individual_rewards=individual_rewards,
                all_smi_rewards=compl_reward,
                all_smi=smiles,
                smiles_extraction_failure=extraction_failures[id_completion],
            ),
        )
        output_models.append(output_model)

    return output_models

`get_smiles_from_completion(comp)`

Extract SMILES strings from a model completion.

This method parses a model completion to extract valid SMILES strings. It handles various formats including answer tags and markdown formatting.

Parameters:

Name	Type	Description	Default
`comp`	`str`	The model completion string to parse.	required

Returns:

Type	Description
`Tuple[List[str], str]`	Tuple containing: - List of valid SMILES strings found in the completion - Failure reason string (empty if successful, otherwise one of: "no_answer", "no_smiles", "no_valid_smiles", "multiple_smiles")

Example

smiles, failure = verifier.get_smiles_from_completion("<answer>CCO</answer>")
# smiles = ["CCO"], failure = ""

Source code in mol_gen_docking/reward/verifiers/generation_reward/generation_verifier.py

def get_smiles_from_completion(self, comp: str) -> Tuple[List[str], str]:
    """Extract SMILES strings from a model completion.

    This method parses a model completion to extract valid SMILES strings.
    It handles various formats including answer tags and markdown formatting.

    Args:
        comp: The model completion string to parse.

    Returns:
        Tuple containing:
            - List of valid SMILES strings found in the completion
            - Failure reason string (empty if successful, otherwise one of:
              "no_answer", "no_smiles", "no_valid_smiles", "multiple_smiles")

    Example:
        ```python
        smiles, failure = verifier.get_smiles_from_completion("<answer>CCO</answer>")
        # smiles = ["CCO"], failure = ""
        ```
    """
    comp = comp.strip()
    reason: str = ""
    comp = self.parse_answer(comp)

    # Now we identify which elements are possibly SMILES
    # First we split the completion by newlines and spaces
    # Then we filter by removing any string that does not contain "C"
    valid_smiles_pattern = re.compile(r"^[A-Za-z0-9=#:\+\-\[\]\(\)/\\@.%]+$")
    mkd_pattern = re.compile(r"^(\*\*|[-*'])(.+)\1$")

    def filter_smiles(x: str) -> str:
        x = x.replace("<|im_end|>", "")
        if len(x) < 3:
            return ""
        # Check if the string is encapsulated in some kind of markdown
        m = mkd_pattern.match(x)
        x = m.group(2) if m else x
        if len(x) < 3:
            return ""
        if (
            "C" in x
            or x.count("c") > 2
            and valid_smiles_pattern.fullmatch(x) is not None
        ):
            return x
        return ""

    # Finally we remove any string that is not a valid SMILES
    def test_is_valid_batch(smis: list[str]) -> list[bool]:
        RDLogger.DisableLog("rdApp.*")
        results = []
        for smi in smis:
            if len(smi) >= 130:
                results.append(False)
                continue
            try:
                mol = Chem.MolFromSmiles(smi)
                if mol is None:
                    results.append(False)
                    continue
                if has_bridged_bond(mol):  ### WE REMOVE BRIDGED MOLS
                    results.append(False)
                    continue
                Chem.MolToMolBlock(mol)
                results.append(True)
            except Exception:
                results.append(False)
        return results

    s_poss = [filter_smiles(x) for x in re.split("\n| |\\.|\t|:|`|'|,", comp)]
    s_poss = [x for x in s_poss if x != ""]
    s_poss = list(set(s_poss))

    if len(s_poss) == 0:
        if reason == "":
            reason = "no_smiles"
        return [], reason

    is_valid: List[bool] = test_is_valid_batch(s_poss)

    s_spl = [x for (x, val) in zip(s_poss, is_valid) if val]
    if s_spl == [] and reason == "":
        reason = "no_valid_smiles"
    elif len(s_spl) > 1:
        reason = "multiple_smiles"
    elif reason == "":
        reason = ""
    return s_spl, reason

Molecular Verifier - Main orchestrator
Property Verifier - Molecular property prediction tasks
Reaction Verifier - Reaction prediction and retro-synthesis tasks

Generation Verifier

Overview

Supported Properties

SMILES Extraction

Extraction Failures

DockingConfigModel

DockingGPUConfigModel

PyscreenerConfigModel

GenerationVerifierInputMetadataModel

validate_properties()

GenerationVerifierOutputModel

GenerationVerifierMetadataModel

GenerationVerifierConfigModel

Config

check_mappings_path()

GenerationVerifier

__init__(verifier_config)

fill_df_properties(df_properties)

get_all_completions_smiles(completions)

get_reward(row)

get_score(inputs)

get_smiles_from_completion(comp)

Related

`DockingConfigModel`

`DockingGPUConfigModel`

`PyscreenerConfigModel`

`GenerationVerifierInputMetadataModel`

`validate_properties()`

`GenerationVerifierOutputModel`

`GenerationVerifierMetadataModel`

`GenerationVerifierConfigModel`

`Config`

`check_mappings_path()`

`GenerationVerifier`

`init(verifier_config)`

`fill_df_properties(df_properties)`

`get_all_completions_smiles(completions)`

`get_reward(row)`

`get_score(inputs)`

`get_smiles_from_completion(comp)`