We show that the CVAE model can quantitatively describe complex biophysical processes such as protein folding.
Significance and Impact
The resonance properties of the dielectric materials are great assets in sensing and imaging applications. The ability to tune the magnitude and sensitivities of the dielectric materials is critical in applications where global use of strong excitation field is not suitable. As a superior alternative to plasmonic field enhancement, the studied material can improve surface enhanced spectroscopies and resonant sensing, where traditionally plasmons are invoked.
- We use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner.
- We demonstrate our approach on three model protein folding systems, namely Fs-peptide (14 μs aggregate sampling), villin head piece (single trajectory of 125 μs) and β- β- α (BBA) protein (223 + 102 μs sampling across two independent trajectories).
- In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways.
Publication/Citation and DOI:
Debsindhu Bhowmik, Shang Gao, Michael T. Young, and Arvind Ramanathan, “Deep Clustering of Protein Folding Simulations.” BMC Bioinformatics (2018), 19, 18, 484. doi: 10.1186/s12859-018-2507-5
To study the intermediate stages of protein folding, researchers at Oak Ridge National Laboratory adapted a deep-learning algorithm known as a convolutional variational autoencoder, which automatically extracted relevant information about protein folding configurations on Summitdev, a small-scale precursor to the Summit supercomputer located at the Oak Ridge Leadership Computing Facility, a DOE Office of Science User Facility at ORNL. By studying the folding pathways of multiple proteins, the team uncovered intermediate stages that serve as “guideposts” to help navigate the folding process while observing latent facets of protein behavior.
By modeling machine-learning algorithms on NVIDIA DGX-2 boxes, computing systems designed for artificial intelligence applications, the researchers hope to develop more precise techniques to better understand the differences between correctly folded and “misfolded” proteins. Running multiple algorithms on the DGX-2s at once allowed the team to quickly compile data and develop HyperSpace, a specialized software package that simplifies and streamlines the process of optimizing hyperparameters, which are parameters set before algorithms start making decisions.
Last Updated: May 28, 2020 - 4:02 pm