The klub 17 sequences

1/27/2024

Typically, an existing multiple sequence alignment (MSA) is turned into a pHMM, but pHMMs can also be trained on unaligned sequences and a MSA can be decoded from the learned model. One of their applications is remote homology search in large databases. Profile hidden Markov models (pHMMs) are probabilistic models for protein families. All experiments were done on a standard workstation with a GPU. On the established benchmarks HomFam and BaliFam with smaller sequence sets, it matches state-of-the-art performance. When tested on ultra-large protein families with up to 3.5 million sequences, learnMSA is both more accurate and faster than state-of-the-art tools. We use uniform batch sampling to adapt to large datasets in linear time without the requirement of a tree. Our method does not involve progressive, regressive, or divide-and-conquer heuristics. We rely on automatic differentiation of the log-likelihood, and thus, our approach is different from existing HMM training algorithms like Baum–Welch. Fundamentally different from popular aligners, we fit a custom recurrent neural network architecture for (p)HMMs to potentially millions of sequences with respect to a maximum a posteriori objective and decode an alignment. We present learnMSA, a novel statistical learning approach of profile hidden Markov models (pHMMs) based on batch gradient descent.

0 Comments

The klub 17 sequences

Leave a Reply.

Author

Archives

Categories