Top-k Metric

Overview

The top-k metric is a standard evaluation metric for molecular generation tasks. It measures the average quality of the top k unique molecules from a set of generated candidates, ensuring that duplicate molecules are not counted multiple times.

Note

Uniqueness Constraint

The top-k metric enforces uniqueness by:

Converting all molecules to canonical SMILES representation
Removing duplicate molecules
Selecting the k molecules with the highest scores

Padding Mechanism

If fewer than k unique molecules are available (i.e the model cannot generate as many candidates), the remaining slots are padded with 0.0 scores.

Usage Examples

Basic Usage with SMILES

from mol_gen_docking.evaluation.top_k import top_k

# List of generated molecules as SMILES
smiles = [
    "CC(C)Cc1ccc(cc1)C(C)C(O)=O",
    "c1ccccc1",
    "CCO",
    "CC(C)Cc1ccc(cc1)C(C)C(=O)O" # Ibuprofen duplicate but different smiles
]

# Docking scores for each molecule
scores = [8.5, 6.2, 6.1, 8.5]

# Calculate top-2 score
metric = top_k(smiles, scores, k=2)
print(f"Top-2 score: {metric}")
# Output:
# >>> 7.35

Using RDKit Mol Objects

from mol_gen_docking.evaluation.top_k import top_k
from rdkit import Chem

# Convert to Mol objects
mols = [Chem.MolFromSmiles(smi) for smi in smiles]

# top_k automatically canonicalizes Mol objects
metric = top_k(mols, scores, k=2)
print(metric)

# Output:
# >>> 7.35

Without Canonicalization

# If SMILES strings are already canonical
metric = top_k(smiles, scores, k=2, canonicalize=False)
print(metric)

# Output:
# >>> 8.5 # Since both ibuprofen entries are considered unique without canonicalization

Function Reference

Top-k evaluation metric for molecular generation tasks.

This module implements the standard top-k metric for evaluating molecular generation models. The metric measures the average quality of the top k unique molecules from a set of generated candidates.

`top_k(mols, scores, k, canonicalize=True)`

Calculate the top-k metric for molecular generation.

This function computes the average score of the top k unique molecules from a set of candidates. It first deduplicates molecules using canonical SMILES representation, then selects the k molecules with the highest scores. This metric is useful for evaluating the quality of generated molecules.

Parameters:

Name	Type	Description	Default
`mols`	`List[str] \| List[Mol]`	List of molecules as SMILES strings or RDKit Mol objects.	required
`scores`	`List[float]`	List of scores corresponding to each molecule (e.g., docking scores, binding affinity). Must have the same length as mols.	required
`k`	`int`	Number of top molecules to consider. If fewer than k unique molecules are provided, the remaining slots are filled with 0.0 scores.	required
`canonicalize`	`bool`	Whether to canonicalize SMILES strings before comparison. If True, molecules are converted to canonical SMILES for deduplication. Default is True.	`True`

Returns:

Type	Description
`float`	Average score of the top k unique molecules. If fewer than k unique molecules
`float`	are available, the average includes padding with 0.0 scores.

Raises:

Type	Description
`AssertionError`	If mols and scores have different lengths.

Example

from mol_gen_docking.evaluation.top_k import top_k
from rdkit import Chem

# Using SMILES strings
smiles = ["CC(C)Cc1ccc(cc1)C(C)C(O)=O", "c1ccccc1"]
scores = [8.5, 7.2]
metric = top_k(smiles, scores, k=2)
print(f"Top-2 score: {metric}")

# Using RDKit Mol objects
mols = [Chem.MolFromSmiles(smi) for smi in smiles]
metric = top_k(mols, scores, k=2)

Notes

Duplicate molecules are automatically detected and removed using canonical SMILES
If k is larger than the number of unique molecules, remaining slots are filled with 0.0
The final metric is the average of the k highest scores

Source code in mol_gen_docking/evaluation/top_k.py

def top_k(
    mols: List[str] | List[Chem.Mol],
    scores: List[float],
    k: int,
    canonicalize: bool = True,
) -> float:
    """Calculate the top-k metric for molecular generation.

    This function computes the average score of the top k unique molecules from a set
    of candidates. It first deduplicates molecules using canonical SMILES representation,
    then selects the k molecules with the highest scores. This metric is useful for
    evaluating the quality of generated molecules.

    Args:
        mols: List of molecules as SMILES strings or RDKit Mol objects.
        scores: List of scores corresponding to each molecule (e.g., docking scores,
            binding affinity). Must have the same length as mols.
        k: Number of top molecules to consider. If fewer than k unique molecules are
            provided, the remaining slots are filled with 0.0 scores.
        canonicalize: Whether to canonicalize SMILES strings before comparison.
            If True, molecules are converted to canonical SMILES for deduplication.
            Default is True.

    Returns:
        Average score of the top k unique molecules. If fewer than k unique molecules
        are available, the average includes padding with 0.0 scores.

    Raises:
        AssertionError: If mols and scores have different lengths.

    Example:
        ```python
        from mol_gen_docking.evaluation.top_k import top_k
        from rdkit import Chem

        # Using SMILES strings
        smiles = ["CC(C)Cc1ccc(cc1)C(C)C(O)=O", "c1ccccc1"]
        scores = [8.5, 7.2]
        metric = top_k(smiles, scores, k=2)
        print(f"Top-2 score: {metric}")

        # Using RDKit Mol objects
        mols = [Chem.MolFromSmiles(smi) for smi in smiles]
        metric = top_k(mols, scores, k=2)
        ```

    Notes:
        - Duplicate molecules are automatically detected and removed using canonical SMILES
        - If k is larger than the number of unique molecules, remaining slots are filled with 0.0
        - The final metric is the average of the k highest scores
    """
    smi_list: List[str]
    if canonicalize or isinstance(mols[0], Chem.Mol):
        if isinstance(mols[0], str):
            mols_list = [Chem.MolFromSmiles(smi) for smi in mols]
        else:
            mols_list = mols
        smi_list = [Chem.MolToSmiles(mol, canonical=True) for mol in mols_list]
    else:
        smi_list = mols

    # Drop ducplicates and keep idxs
    seen = set()
    unique_idxs = []
    for idx, smi in enumerate(smi_list):
        if smi not in seen:
            seen.add(smi)
            unique_idxs.append(idx)
    unique_scores = [scores[idx] for idx in unique_idxs] + [
        0.0 for _ in range(len(unique_idxs), k)
    ]
    unique_scores = sorted(unique_scores, reverse=True)[:k]
    return sum(unique_scores) / k

Benchmark Context

In the MolGenDocking project, the top-k metric is used to evaluate molecular generation models on raw-quality without diversity constraints. We assess how well a model can generate multiple high-scoring molecules, possibly from a same chemical serie.

See Diversity-Aware Top-k for a variant that also enforces chemical diversity.