In the quest for new therapeutics, one of the most critical challenges in computational drug discovery is accurately predicting how a small molecule (a ligand) interacts with a target protein. While structural biology provides high-resolution insights, obtaining these structures experimentally is slow and expensive. On the other hand, sequence-based deep learning offers speed and scale but often misses the crucial 3D spatial context of the binding site.
Cella Nova was designed to bridge this gap. By combining the scale of protein language models with the structural precision of the Boltz-2 framework, we have developed a system that predicts not just if a molecule binds, but how strongly it binds and what the nature of that interaction is.
Traditionally, interaction prediction falls into two camps:
Cella Nova asks: Can we train a fast sequence-based model to “think” like a structure-based model?
The foundation of Cella Nova is a sophisticated multi-modal architecture designed to capture the nuances of both the protein and the ligand.
We utilize ESM-2, a 650-million parameter protein language model. Rather than using global embeddings, we incorporate binding-pocket attention, allowing the model to focus on the specific residues likely to form the interaction site.
Small molecules are processed through a dual-stage pipeline:
The protein and molecule representations are fused via a cross-attention mechanism, simulating the “hand-in-glove” fit of a ligand in a pocket. This fused representation feeds into a multi-task head that simultaneously predicts:
To move beyond the limitations of sequence-only learning, we introduced a Hybrid Model leveraging Knowledge Distillation.
We use Boltz-2, a state-of-the-art structural prediction model, as a “teacher.” Boltz-2 can predict the 3D geometry and affinity of a complex from scratch. However, Boltz-2 is computationally expensive and too slow for large-scale screening.
The Distillation Process: Instead of just training our student model on experimental data (which is often sparse), we train it to mimic the “soft labels” produced by Boltz-2. By balancing the loss between experimental ground truth and Boltz-2’s predictions, the student model internalizes the structural “intuition” of the teacher.
The result is a model that maintains the inference speed of a sequence-based network but achieves the accuracy of a structure-guided system.
The impact of the hybrid approach is evident in the performance metrics. When tested on held-out human target data from ChEMBL, the Hybrid model consistently outperformed the Full sequence-based model.
| Model | AUC-ROC |
|---|---|
| Full Model | 0.91 |
| Hybrid Model | 0.94 |
| Model | RMSE (Lower is better) | Pearson r (Higher is better) |
|---|---|---|
| Full Model | 0.81 | 0.87 |
| Hybrid Model | 0.68 | 0.91 |
The reduction in RMSE (from 0.81 to 0.68) represents a significant leap in the precision of affinity predictions, moving us closer to a reliable “digital assay.”
Cella Nova demonstrates that the dichotomy between sequence-based and structure-based modeling is a false one. By using knowledge distillation to transfer structural insights from a heavy teacher model (Boltz-2) into a lightweight student, we can achieve high-fidelity interaction predictions at a fraction of the computational cost.
This approach paves the way for more efficient virtual screening, allowing researchers to narrow down millions of potential drug candidates to a handful of high-probability leads with confidence, accelerating the journey from computer screen to clinic.