![]() Typically, an existing multiple sequence alignment (MSA) is turned into a pHMM, but pHMMs can also be trained on unaligned sequences and a MSA can be decoded from the learned model. One of their applications is remote homology search in large databases. Profile hidden Markov models (pHMMs) are probabilistic models for protein families. All experiments were done on a standard workstation with a GPU. On the established benchmarks HomFam and BaliFam with smaller sequence sets, it matches state-of-the-art performance. When tested on ultra-large protein families with up to 3.5 million sequences, learnMSA is both more accurate and faster than state-of-the-art tools. We use uniform batch sampling to adapt to large datasets in linear time without the requirement of a tree. Our method does not involve progressive, regressive, or divide-and-conquer heuristics. We rely on automatic differentiation of the log-likelihood, and thus, our approach is different from existing HMM training algorithms like Baum–Welch. Fundamentally different from popular aligners, we fit a custom recurrent neural network architecture for (p)HMMs to potentially millions of sequences with respect to a maximum a posteriori objective and decode an alignment. We present learnMSA, a novel statistical learning approach of profile hidden Markov models (pHMMs) based on batch gradient descent. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |