Genome Structure and Physics-Informed Machine Learning
Online (Zoom meeting)
zoom meeting details:
Zoom link: https://zoom.us/j/95229199186?pwd=SUyIpc4yolNtYqZeKpHaR1dxnA9m8A.1
Meeting ID: 952 2919 9186; Passcode: 621708
Abstract
The vastness of genomic sequence space and the heterogeneity of mutational data require efficient computational frameworks to extract meaningful biological insights. In this talk, I present two complementary approaches addressing this challenge.
First, I introduce cgNA+, a physics-informed machine learning model of nucleic acids trained on atomistic molecular dynamics simulations. It near-instantaneously predicts sequence-dependent DNA structural properties, enabling genome-scale analyses to investigate how sequence influences features such as groove geometry, DNA cyclization, and nucleosome wrapping, providing insights into how sequence encodes structure and function.
Second, I present an unsupervised machine learning framework to decode mutational patterns in the human germline using nearly 900,000 whole genomes. By analyzing single- and double-nucleotide variants along with insertions and deletions, this work establishes a comprehensive catalogue of germline mutational signatures and their underlying putative mechanisms. These findings inform our understanding of disease origins, ancestry-specific processes, and population stratification in drug development. Together, these projects show how ML enables genome-wide exploration, supporting improved diagnosis, risk stratification, and therapeutic development.