Skip to content

Top-k Metric

Overview

The top-k metric is a standard evaluation metric for molecular generation tasks. It measures the average quality of the top k unique molecules from a set of generated candidates, ensuring that duplicate molecules are not counted multiple times.

Note

Uniqueness Constraint

The top-k metric enforces uniqueness by:

  • Converting all molecules to canonical SMILES representation
  • Removing duplicate molecules
  • Selecting the k molecules with the highest scores

Padding Mechanism

If fewer than k unique molecules are available (i.e the model cannot generate as many candidates), the remaining slots are padded with 0.0 scores.

Usage Examples

Basic Usage with SMILES

from mol_gen_docking.evaluation.top_k import top_k

# List of generated molecules as SMILES
smiles = [
    "CC(C)Cc1ccc(cc1)C(C)C(O)=O",
    "c1ccccc1",
    "CCO",
    "CC(C)Cc1ccc(cc1)C(C)C(=O)O" # Ibuprofen duplicate but different smiles
]

# Docking scores for each molecule
scores = [8.5, 6.2, 6.1, 8.5]

# Calculate top-2 score
metric = top_k(smiles, scores, k=2)
print(f"Top-2 score: {metric}")
# Output:
# >>> 7.35

Using RDKit Mol Objects

from mol_gen_docking.evaluation.top_k import top_k
from rdkit import Chem

# Convert to Mol objects
mols = [Chem.MolFromSmiles(smi) for smi in smiles]

# top_k automatically canonicalizes Mol objects
metric = top_k(mols, scores, k=2)
print(metric)

# Output:
# >>> 7.35

Without Canonicalization

# If SMILES strings are already canonical
metric = top_k(smiles, scores, k=2, canonicalize=False)
print(metric)

# Output:
# >>> 8.5 # Since both ibuprofen entries are considered unique without canonicalization

Function Reference

Top-k evaluation metric for molecular generation tasks.

This module implements the standard top-k metric for evaluating molecular generation models. The metric measures the average quality of the top k unique molecules from a set of generated candidates.

top_k(mols, scores, k, canonicalize=True)

Calculate the top-k metric for molecular generation.

This function computes the average score of the top k unique molecules from a set of candidates. It first deduplicates molecules using canonical SMILES representation, then selects the k molecules with the highest scores. This metric is useful for evaluating the quality of generated molecules.

Parameters:

Name Type Description Default
mols List[str] | List[Mol]

List of molecules as SMILES strings or RDKit Mol objects.

required
scores List[float]

List of scores corresponding to each molecule (e.g., docking scores, binding affinity). Must have the same length as mols.

required
k int

Number of top molecules to consider. If fewer than k unique molecules are provided, the remaining slots are filled with 0.0 scores.

required
canonicalize bool

Whether to canonicalize SMILES strings before comparison. If True, molecules are converted to canonical SMILES for deduplication. Default is True.

True

Returns:

Type Description
float

Average score of the top k unique molecules. If fewer than k unique molecules

float

are available, the average includes padding with 0.0 scores.

Raises:

Type Description
AssertionError

If mols and scores have different lengths.

Example
from mol_gen_docking.evaluation.top_k import top_k
from rdkit import Chem

# Using SMILES strings
smiles = ["CC(C)Cc1ccc(cc1)C(C)C(O)=O", "c1ccccc1"]
scores = [8.5, 7.2]
metric = top_k(smiles, scores, k=2)
print(f"Top-2 score: {metric}")

# Using RDKit Mol objects
mols = [Chem.MolFromSmiles(smi) for smi in smiles]
metric = top_k(mols, scores, k=2)
Notes
  • Duplicate molecules are automatically detected and removed using canonical SMILES
  • If k is larger than the number of unique molecules, remaining slots are filled with 0.0
  • The final metric is the average of the k highest scores
Source code in mol_gen_docking/evaluation/top_k.py
def top_k(
    mols: List[str] | List[Chem.Mol],
    scores: List[float],
    k: int,
    canonicalize: bool = True,
) -> float:
    """Calculate the top-k metric for molecular generation.

    This function computes the average score of the top k unique molecules from a set
    of candidates. It first deduplicates molecules using canonical SMILES representation,
    then selects the k molecules with the highest scores. This metric is useful for
    evaluating the quality of generated molecules.

    Args:
        mols: List of molecules as SMILES strings or RDKit Mol objects.
        scores: List of scores corresponding to each molecule (e.g., docking scores,
            binding affinity). Must have the same length as mols.
        k: Number of top molecules to consider. If fewer than k unique molecules are
            provided, the remaining slots are filled with 0.0 scores.
        canonicalize: Whether to canonicalize SMILES strings before comparison.
            If True, molecules are converted to canonical SMILES for deduplication.
            Default is True.

    Returns:
        Average score of the top k unique molecules. If fewer than k unique molecules
        are available, the average includes padding with 0.0 scores.

    Raises:
        AssertionError: If mols and scores have different lengths.

    Example:
        ```python
        from mol_gen_docking.evaluation.top_k import top_k
        from rdkit import Chem

        # Using SMILES strings
        smiles = ["CC(C)Cc1ccc(cc1)C(C)C(O)=O", "c1ccccc1"]
        scores = [8.5, 7.2]
        metric = top_k(smiles, scores, k=2)
        print(f"Top-2 score: {metric}")

        # Using RDKit Mol objects
        mols = [Chem.MolFromSmiles(smi) for smi in smiles]
        metric = top_k(mols, scores, k=2)
        ```

    Notes:
        - Duplicate molecules are automatically detected and removed using canonical SMILES
        - If k is larger than the number of unique molecules, remaining slots are filled with 0.0
        - The final metric is the average of the k highest scores
    """
    smi_list: List[str]
    if canonicalize or isinstance(mols[0], Chem.Mol):
        if isinstance(mols[0], str):
            mols_list = [Chem.MolFromSmiles(smi) for smi in mols]
        else:
            mols_list = mols
        smi_list = [Chem.MolToSmiles(mol, canonical=True) for mol in mols_list]
    else:
        smi_list = mols

    # Drop ducplicates and keep idxs
    seen = set()
    unique_idxs = []
    for idx, smi in enumerate(smi_list):
        if smi not in seen:
            seen.add(smi)
            unique_idxs.append(idx)
    unique_scores = [scores[idx] for idx in unique_idxs] + [
        0.0 for _ in range(len(unique_idxs), k)
    ]
    unique_scores = sorted(unique_scores, reverse=True)[:k]
    return sum(unique_scores) / k

Benchmark Context

In the MolGenDocking project, the top-k metric is used to evaluate molecular generation models on raw-quality without diversity constraints. We assess how well a model can generate multiple high-scoring molecules, possibly from a same chemical serie.

See Diversity-Aware Top-k for a variant that also enforces chemical diversity.