Physics-driven machine learning : from biomolecules to crystals
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Physical systems and their interactions exhibit inherent equivariance. In machine learning (ML), predicting quantities derived from these interactions follows two main approaches: constructing invariant scalar features as inputs to invariant models or employing equivariant models directly. This thesis focuses on the former, investigating feature extraction and data representation in the context of physics-driven machine learning (PDML). PDML leverages prior physical knowledge to construct descriptors that encode symmetries inherent in the data, thereby reducing dimensionality, enhancing interpretability, and improving generalization performance. The research addresses critical questions such as the limitations of physics-informed descriptors, the feasibility of dimensionality reduction without compromising prediction accuracy, the comparative performance of PDML against traditional ML methods, and the scalability of PDML in atomistic systems. Key investigations include:
- Copper-based alloys: Combining molecular simulations and active learning (AL) to discover stable precipitate phases and assess mechanical properties. This involves density functional theory (DFT) simulations and the development of machine learning interatomic potentials (MLIPs) using moment tensor potentials (MTPs), leveraging invariant polynomials to model multi-component alloys.
- Nanopore translocations: Improving DNA sequencing accuracy by training ML models on experimental ionic blockade data from DNA translocation through nanopores. The approach employs dimensionality reduction through a set of physical descriptors to efficiently classify nucleotide identities, with an emphasis on increasing readout accuracy and reducing model complexity.
- High-Tc superconductivity: Proposing an effective PDML model to predict critical temperatures of superconductors by extracting key electronic and atomic features. Despite the reduced feature space, the model achieves high accuracy, offering a streamlined approach to predicting superconductor properties with minimal computational overhead. This work bridges the gap between machine learning and physics by embedding physical principles into ML feature representations, enhancing the ability to model, predict, and control complex physical systems with greater precision and efficiency. These advancements aim to unlock transformative applications and discoveries across a range of scientific and technological domains.