MK. Mini-symposium: Machine Learning
Monday, 2022-06-20, 01:45 PM
Noyes Laboratory 217
SESSION CHAIR: Daniel P. Tabor (Texas A\&M University, College Station, TX)
|
|
|
MK01 |
Invited Mini-Symposium Talk |
30 min |
01:45 PM - 02:15 PM |
P6216: ELUCIDATING, ANALYZING, AND DESIGNING SPECTROSCOPIES: LEVERAGING THEORY AND CHEMICAL INTUITION TO GET THE MOST OUT OF MACHINE LEARNING |
THOMAS E MARKLAND, Department of Chemistry, Stanford University, Stanford, CA, USA; |
IDEALS Archive (Abstract PDF / Presentation File) |
DOI: https://dx.doi.org/10.15278/isms.2022.MK01 |
CLICK TO SHOW HTML
Advances in machine learning are pushing the forefront of what can be simulated and understood about the nature of chemical systems, offering intriguing possibilities for developing new chemical insights in spectroscopies ranging from NMR, to multidimensional electronic spectroscopies, and the recently introduced impulsive nuclear x-ray scattering. In this talk I will present our latest developments showing how machine learning’s potential to simulate, analyze and design spectroscopic experiments can be maximized by building chemical intuition and theoretical insights into the underlying frameworks.
|
|
MK02 |
Contributed Talk |
15 min |
02:21 PM - 02:36 PM |
P6261: LOW-FREQUENCY INFRARED SPECTRUM OF LIQUID WATER FROM MACHINE-LEARNING BASED PARTIAL ATOMIC CHARGES |
BOWEN HAN, CHRISTINE M ISBORN, LIANG SHI, Department of Chemistry and Biochemistry, University of California, Merced, Merced, CA, USA; |
IDEALS Archive (Abstract PDF / Presentation File) |
DOI: https://dx.doi.org/10.15278/isms.2022.MK02 |
CLICK TO SHOW HTML
Modeling water in condensed phases is an indispensable part of modern water research and rigid non-polarizable water models, such as TIP4P/2005, have been very popular in molecular simulations due to their high efficiency. Although these water models can reproduce many properties of water, they fail in predicting the dielectric properties of water, such as the dielectric constant and low-frequency infrared spectra. We propose to improve these models by re-assigning the partial atomic charges of water molecules according to their local environment using a machine-learning (ML) model that is trained on quantum chemical data. With the ML-based charges, the calculated low-frequency infrared spectrum of liquid water is in good agreement with experiment, showing a peak at about 200 cm−1, which non-polarizable water models fail to reproduce. The effects of charge redistributions in liquid water and their dependence on the choice of the density functional are also discussed.
|
|
MK03 |
Contributed Talk |
15 min |
02:39 PM - 02:54 PM |
P6176: MULTI-FIDELITY DEEP LEARNING AND ACTIVE LEARNING FOR MOLECULAR OPTICAL PROPERTIES |
KEVIN P. GREENMAN, WILLIAM H. GREEN, Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA; RAFAEL GÓMEZ-BOMBARELLI, Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA; |
IDEALS Archive (Abstract PDF / Presentation File) |
DOI: https://dx.doi.org/10.15278/isms.2022.MK03 |
CLICK TO SHOW HTML
A variety of physics-based and statistical methods have been developed to guide molecular design based on optical properties. Each method has a trade-off between cost, accuracy, and generalizability. While methods such as time-dependent density functional theory (TD-DFT) are often generalizable across chemical space due to their foundations in physics, they are relatively slow and are less suitable for screening large libraries of molecules. Statistical or machine learning methods are fast, but their performance is highly dependent on the choice of training data and representation. This makes them useful for design within chemical families that already have large datasets available, but less useful for de novo design tasks that explore new parts of chemical space. We propose a new deep learning method that leverages a combination of low fidelity (TD-DFT) and high fidelity (experimental) data sets to predict molecular optical properties with improved accuracy and generalizability over existing statistical methods. We also illustrate the importance of non-random data splitting strategies to assess generalizability of predictions for spectra in condensed phase. Finally, we demonstrate the use of active learning for model improvement by gathering new experimental data in regions of high prediction uncertainty.
|
|
MK04 |
Contributed Talk |
15 min |
02:57 PM - 03:12 PM |
P6085: MULTIVARIATE ANALYSIS OF MOLECULAR SPECTROSCOPY DATA FOR COVID-19 DETECTION |
QIZHONG LIANG, YA-CHU CHAN, JUTTA TOSCANO, JILA and NIST, University of Colorado, Boulder, CO, USA; KRISTEN K. BJORKMAN, LESLIE A. LEINWAND, ROY PARKER, BioFrontiers Institute , University of Colorado Boulder, Boulder, CO, USA; DAVID NESBITT, JUN YE, JILA and NIST, University of Colorado, Boulder, CO, USA; |
IDEALS Archive (Abstract PDF / Presentation File) |
DOI: https://dx.doi.org/10.15278/isms.2022.MK04 |
CLICK TO SHOW HTML
In exhaled human breath, there exist hundreds of sparse molecular species and many contain rich information about various health conditions or diseases. When associated with a specific medical response, a co-variation in concentrations for multiple molecular species can occur, thereby facilitating diagnosis. A recent technological improvement to the cavity-enhanced frequency comb spectroscopy (CE-DFCS) has enabled broadband molecular spectra to be collected at the parts-per-trillion detection sensitivity Q. liang, et al., “Ultrasensitive multispecies spectroscopic breath analysis for real-time health monitoring and diagnostics,” PNAS 118(40) (2021). allowing unambiguous and objective detection of multiple molecular species in a simultaneous manner. Here, we show how the breath spectroscopy data collected by CE-DFCS can realize non-invasive medical diagnostics Q. liang, et al., “Frequency comb and machine learning-based breath analysis for COVID-19 classification,” arXiv:2202.02321 (2022). The key to such realization comes from the use of supervised machine learning to process the comb spectroscopy data in parallel with extreme-dimensional data channel inputs. Using a total of 170 individual breath samples, we report cross-validated results with excellent discrimination capability for COVID-19. At the same time, significant differences are identified for several other personal attributes, including smoking, abdominal pain, and biological sex difference. Our demonstrated approach can be extended immediately to investigate the diagnostic potential for a number of other disease states, including breast cancer, asthma, and intestinal problems. We discuss how further development in machine learning and frequency comb-based breath analysis can benefit significantly from enriching the absorption database to include more molecular species.
Footnotes:
Q. liang, et al., “Ultrasensitive multispecies spectroscopic breath analysis for real-time health monitoring and diagnostics,” PNAS 118(40) (2021).,
Q. liang, et al., “Frequency comb and machine learning-based breath analysis for COVID-19 classification,” arXiv:2202.02321 (2022)..
|
|
|
|
|
03:15 PM |
INTERMISSION |
|
|
MK05 |
Invited Mini-Symposium Talk |
30 min |
03:54 PM - 04:24 PM |
P6332: CAPTURING, PREDICTING, AND UNDERSTANDING OPTICAL SIGNALS: HARNESSING MACHINE LEARNING TO TACKLE ENERGY DISSIPATION IN THE CONDENSED PHASE |
ANDRES MONTOYA-CASTILLO, Department of Chemistry, University of Colorado, Boulder, CO, USA; |
IDEALS Archive (Abstract PDF / Presentation File) |
DOI: https://dx.doi.org/10.15278/isms.2022.MK05 |
CLICK TO SHOW HTML
While optical spectroscopies provide an essential and ever-expanding toolbox for probing and elucidating how materials absorb, transport, and dissipate energy, accurately predicting their signals remains a formidable challenge to theory. By drastically expanding our ability accurately and efficiently simulate complex systems and their dynamics, machine learning techniques are opening fascinating possibilities for the simulation and analysis of various spectroscopies. In this talk, I will focus on our latest advances showing how one can exploit chemical intuition to combine machine learning techniques with robust theoretical frameworks to faithfully capture and interpret energy transport pathways encoded in optical signals.
|
|
MK06 |
Contributed Talk |
15 min |
04:30 PM - 04:45 PM |
P6473: SYMMETRY-CONSTRAINED MOLECULAR DYNAMICS |
SAM COX, ANDREW WHITE, Chemical Engineering, University of Rochester, Rochester, NY, USA; |
IDEALS Archive (Abstract PDF / Presentation File) |
DOI: https://dx.doi.org/10.15278/isms.2022.MK06 |
CLICK TO SHOW HTML
Molecular dynamics is a popular tool for molecular structure prediction, but the application into crystal structures has been limited by the inability to treat point group symmetries. For this reason, many space groups are inaccessible in typical molecular dynamics, though the inaccessible space groups are often desirable. We propose symmetry-constrained molecular dynamics as a new approach to address these space groups. This method allows all point group symmetries to be accessible in molecular dynamics simulations. Because there is a small number of possible space groups, these can be enumerated, as shown in this work. Spectroscopy and molecular dynamics are mutually beneficial techniques to understand systems more fully, and spectroscopy has deep roots in symmetry, as symmetries give insight into chemical shift prediction, for molecules and crystals. Therefore, this work bridges the gap between spectroscopy and structure prediction by molecular dynamics.
r0pt
Figure
|
|
MK07 |
Contributed Talk |
15 min |
04:48 PM - 05:03 PM |
P6494: ACCELERATING MANY-BODY EXPANSION THEORY THROUGH GRAPH CONVOLUTIONAL NETWORKS |
YILI SHEN, College of Software Engineering, Tongji University, Shanghai, China; CHENGWEI JU, Pritzker School of Molecular Engineering, The University of Chicago, Chicago, IL, USA; JUN YI, ZHOU LIN, Department of Chemistry, University of Massachusetts, Amherst, MA, USA; HUI GUAN, College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA; |
IDEALS Archive (Abstract PDF / Presentation File) |
DOI: https://dx.doi.org/10.15278/isms.2022.MK07 |
CLICK TO SHOW HTML
First-principles quantum mechanical modeling can potentially interpret and predict experimentally measurable properties of large molecules or systems, such as energies, provided that the difficulty in balancing its computational efficiency and accuracy is overcome. Many-body expansion theory (MBET) has been developed to resolve this issue: it approximates the total energy of a large system through a truncated expansion of one-, two-, …, n-body energies, but it still suffers from a computational bottleneck, expensive first-principles evaluations of all many-body energies. In the present study, we integrated the graph convolutional network (GCN), a state-of-the-art machine learning (ML) algorithm, into the existing first-principles workflow, and developed a novel scheme referred to as GCN-MBET. Operationally, we evaluated all one-body energies using conventional first-principle quantum mechanics, but obtained many-body energies based on their relationships with effortless molecular descriptors established by GCN. As the initial stage of the study, we provided a proof-of-concept of our GCN-MBET model using two- and three-body energies from representative van der Waals or hydrogen-bonded molecular aggregates, including the water cluster, the phenol cluster, and water-phenol mixture. Given sufficient configurational diversity in the training set, we successfully reproduced first-principles two- and three-body energies in the test set to the chemical accuracy ( < 1 kcal/mol), but at a fractional computational cost ( ≅ 1 %).
Our results indicated that GCN-MBET provides a promising unique and powerful tool to unlock the potential of first-principles quantum mechanical modeling of large molecules or systems.
|
|