K-DREAM: Teaching AI Biology to Design Molecules

K-DREAM integrates biomedical knowledge graphs into generative models, allowing AI to design drug molecules with purpose, like an experienced chemist, instead of searching blindly.
The combination of generative AI and rapid scoring breaks through the efficiency bottleneck of blind docking, making large-scale, unbiased virtual screening possible.
The ADFLIP model, based on all-atom discrete flow matching, can design protein sequences that interact precisely with specific ligands, nucleic acids, or metal ions, and it can also handle dynamic conformations.

1. K-DREAM: Teaching AI Biology to Design Molecules

Drug discovery has to contend with a vast chemical space. Generative models can produce huge numbers of molecular structures, but they don’t understand biology. They just randomly combine numbers and vectors, so most of their output is useless.

The K-DREAM framework gives generative models a mentor: biomedical knowledge graphs.

How K-DREAM works

The core idea is to give the model biological context directly, so it doesn’t have to guess.

K-DREAM uses a diffusion model as its molecule-generating engine. This engine can start from noise and progressively build a chemical structure. The key is how to guide this process.

This is where the knowledge graph comes in. A knowledge graph contains a complex network of relationships between genes, proteins, diseases, and drugs—the accumulation of decades of biomedical research. K-DREAM compresses this information into mathematically manageable embeddings. Each target protein gets a unique “coordinate” in the knowledge graph.

A module in the framework called the Context Regressor Network (CRN) acts as a translator. The CRN connects “chemical language” (molecular structures) with “biological language” (knowledge graph embeddings). It learns through graph attention layers and can locate a molecule’s corresponding biological “coordinate” in the knowledge graph.

This brings the whole generation process to life. Once a biological goal is set (like a specific kinase target), K-DREAM uses its “coordinate” in the knowledge graph to guide the diffusion model. This way, the generated molecules account for both the target’s biological function and its corresponding chemical structure. The whole process becomes a guided, precise strike.

Solving the multi-target design problem

For many complex diseases, like cancer and autoimmune disorders, inhibiting a single target has limited effect. We need molecules that can regulate multiple targets at once.

Traditional methods struggle with this challenge. K-DREAM offers a solution through vector interpolation.

Each target has a vector “coordinate” in the knowledge graph. To design a molecule that acts on both target A and target B, you just find a midpoint between their vector coordinates and set this new “virtual coordinate” as the generation goal. The model will then generate a molecule with properties of both. This method makes it possible to design drugs with specific multi-target profiles, and it’s easy to do.

Avoiding the “docking score” trap

Many generative models fall into a trap: they over-optimize for docking scores. A docking score is a computational simulation value and doesn’t perfectly correlate with real biological activity. Over-optimizing it is like cramming for an exam—you get a high score but can’t solve real problems. The generated molecules might look perfect on paper but have poor activity when synthesized.

K-DREAM avoids this trap by using the broad, rich biological context from the knowledge graph instead of treating the docking score as the only standard. It guides the model to find molecules in “biologically meaningful” regions, not just ones that score high on a single metric. This strategy is closer to the intuition of a medicinal chemist and is more likely to find valuable drug candidates.

The framework still has room for improvement. Future versions could integrate more dimensions of knowledge, like pharmacokinetics (PK) or absorption, distribution, metabolism, excretion, and toxicity (ADMET) data. This would allow the model to design not just “effective” molecules, but “druggable” ones. K-DREAM shows a path forward: teaching AI to learn and use the biological knowledge humans have accumulated, helping it evolve from a “molecule generator” into a “drug design partner.”

📜Title: Augmenting generative models with biomedical knowledge graphs improves targeted drug discovery 🌐Paper: https://arxiv.org/abs/2510.09914

3. ADFLIP: All-Atom Protein Design for Ligands and Dynamic Structures

Traditional methods for inverse folding in protein design often have limitations. Most of them only look at the protein’s C-alpha backbone to predict the amino acid sequence, ignoring the key details that determine its function. This is like designing a room’s interior based only on the building frame, without considering the furniture (ligands) or utilities (cofactors). The result is often pretty but not functional.

A protein’s function depends on its fine-tuned interactions with other molecules. For example, an enzyme’s active site needs to bind a substrate precisely, and an antibody’s CDR region must recognize an antigen. These interactions happen at the atomic level, determined by amino acid side chains, small molecule ligands, and metal ions. Design models that ignore these critical parts are missing the full picture.

The ADFLIP model, developed by Yi et al., addresses this problem. It is an “all-atom” model.

ADFLIP uses a Discrete Flow Matching generation strategy. You can think of it like a sculptor who starts with a random pile of amino acid “clay” and a complete 3D blueprint that includes every atom (including ligands and ions). The sculptor gradually replaces the “clay” with the correct amino acids, with each decision based on the precise position and chemical properties of the surrounding atoms. During this process, the amino acid side chains also take shape, providing more detailed structural information for subsequent steps.

When designing the binding pocket for a drug target, ADFLIP can “see” every atom of the small molecule drug. This allows it to design an amino acid sequence that forms the best hydrogen bonds, hydrophobic interactions, or salt bridges, making it possible to “design proteins for ligands.”

ADFLIP can also handle protein dynamics. Many proteins aren’t rigid; they function by moving between different conformational states. Traditional methods, based on a single crystal structure, struggle with this dynamic nature. ADFLIP, however, can learn from an ensemble of conformations, like those determined by Nuclear Magnetic Resonance (NMR), and design a sequence that can accommodate all of them. This makes the designed protein more functionally stable in a real physiological environment.

ADFLIP also has a practical feature called “training-free classifier guidance sampling.” This is like adding a navigation system to the design process. Without retraining the main model, you can connect a small, pre-trained predictive model, like the affinity predictor DSMBind. At each step of sequence generation, this “navigator” guides the main model toward a specific goal, such as “improving binding affinity to the ligand.” This approach allows us to “customize” protein sequences for specific needs, like high affinity or high stability.

ADFLIP elevates protein design from backbone-level “sketching” to all-atom “precision engineering,” enabling us to better design functional proteins that can perform specific tasks in complex biological environments.

📜Title: All-atom inverse protein folding through discrete flow matching 🌐Paper: https://raw.githubusercontent.com/mlresearch/v267/main/assets/yi25a/yi25a.pdf

K-DREAM: Teaching AI Biology to Design Molecules

Contents

1. K-DREAM: Teaching AI Biology to Design Molecules

2. DiffDock+UniDock: Industrial-Scale Blind Docking Virtual Screening

3. ADFLIP: All-Atom Protein Design for Ligands and Dynamic Structures