Story
JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures
Key takeaway
Researchers developed a new way to model and understand DNA sequences using machine learning. This could lead to advances in areas like genetics and personalized medicine.
Quick Explainer
JEPA-DNA is a novel technique for pre-training genomic foundation models. Rather than just learning to reconstruct DNA sequences, JEPA-DNA aims to capture the higher-level biological logic and functional semantics by predicting the latent representations of masked sequence segments. This "latent grounding" is achieved by integrating a Joint-Embedding Predictive Architecture (JEPA) into the pre-training process. The key components are a Context Encoder that processes the input, a Target Encoder that provides stable latent targets, and a Predictor Head that maps the context representations into the target latent space. This approach helps JEPA-DNA outperform prior genomic foundation models on a variety of supervised and zero-shot tasks.
Deep Dive
JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures
Introduction
Genomic Foundation Models (GFMs) such as DNABERT-2, Nucleotide Transformer, HyenaDNA, and Evo aim to learn representations of DNA sequences using self-supervised techniques like Masked Language Modeling (MLM) and Next Token Prediction (NTP). While effective at capturing local sequence patterns, these models often fail to internalize the higher-level biological logic and functional semantics of the genome, a limitation the authors term the "granularity trap".
To address this gap, the authors introduce JEPA-DNA, which integrates the Joint-Embedding Predictive Architecture (JEPA) into genomic pre-training. Unlike generative objectives that operate in the raw token space, JEPA-DNA predicts the latent representations of masked segments, forcing the model to learn abstract, functional features. This "latent grounding" can be applied to train new GFMs from scratch or refine existing ones.
Methodology
The key components of JEPA-DNA are:
- Context Encoder ($E_θ$): A sequence backbone (e.g. Transformer) that processes the input sequence, prepending a learnable $[CLS]$ token.
- Target Encoder ($E_θ̅$): A duplicate of the context encoder, with weights updated via Exponential Moving Average, to provide stable latent targets.
- Predictor Head ($P_φ$): A network that maps context representations into the target latent space, aiming to predict the $[CLS]$ representation of the target sequence.
The model is trained using a multi-objective loss:
- LLM Loss ($L_{llm}$): Standard MLM or NTP loss on the masked tokens.
- Latent Predictive Loss ($L_{jepa}$): Cosine similarity between the predicted $[CLS]$ and the target $[CLS]$ representation.
- Variance Loss ($L_{var}$) and Covariance Loss ($L_{cov}$): To prevent representation collapse.
JEPA-DNA is compatible with different GFM architectures and generative objectives (MLM, NTP).
Experiments
The authors evaluate JEPA-DNA using DNABERT-2 as the backbone, pre-training on 7.6B base pairs. They assess performance through:
- Linear Probing: Freezing the encoder and training only a linear classifier on the $[CLS]$ representation. Metrics include AUROC, AUPRC, and MCC.
- Zero-Shot Inference: Measuring cosine similarity between reference and variant sequence embeddings for various genomic tasks.
JEPA-DNA consistently outperforms the DNABERT-2 baseline across a range of supervised and zero-shot tasks, including promoter prediction, variant effect prediction, and trait association.
Limitations and Future Work
The authors outline several directions for future work, including:
- Evaluating JEPA-DNA for training models from scratch vs. continual pre-training
- Exploring JEPA-DNA with other GFM architectures beyond DNABERT-2
- Investigating alternative masking and aggregation strategies
- Incorporating additional auxiliary tasks
- Conducting more comprehensive ablations and significance analysis
- Comparing JEPA-DNA to other self-supervised learning paradigms
Conclusion
JEPA-DNA demonstrates that shifting the pre-training focus from literal token reconstruction to latent feature prediction can yield more biologically relevant representations for genomic foundation models.