Profile Hidden Markov Model (ProfileHMM) 


Profile Hidden Markov Model (ProfileHMM) search for term

Profile analysis has long been a useful tool in finding and aligning distantly related sequences and in identifying known sequence domains in new sequences. Basically, a profile is a description of the consensus of a multiple sequence alignment. It uses a position-specific scoring system to capture information about the degree of conservation at various positions in the multiple alignment. This makes it a much more sensitive and specific method for database searching than pairwise methods, such as those used by BLAST or FastA, that use position-independent scoring. Hidden Markov modeling, a technique that has been used for years in speech recognition, is now being applied to many types of problems in molecular sequence analysis. In particular, this technique can produce profiles that are an improvement over traditionally constructed profiles. Profile hidden Markov models (HMMs) have several advantages over standard profiles. Profile HMMs have a formal probabilistic basis and have a consistant theory behind gap and insertion scores, in contrast to standard profile methods which use heuristic methods. HMMs apply a statistical method to estimate the true frequency of a residue at a given position in the alignment from its observed frequency while standard profiles use the observed frequency itself to assign the score for that residue. This means that a profile HMM derived from only 10 to 20 aligned sequences can be of equivalent quality to a standard profile created from 40 to 50 aligned sequences. In general, producing good profile HMMs requires less skill and manual intervention than producing good standard profiles. A profile HMM is a linear state machine consisting of a series of nodes, each of which corresponds roughly to a position (column) in the alignment from which it was built. If we ignore gaps, the correspondence is exact -- the profile HMM has a node for each column in the alignment, and each node can exist in one state, a match state. A profile HMM has several types of probabilities associated with it. One type is the transition probability -- the probability of transitioning from one state to another. There are also emissions probabilities associated with each match state, based on the probability of a given residue existing at that position in the alignment. If you follow a path through the model to generate a sequence consistent with the model, the probability of any sequence that is generated depends on the transition and emissions probabilities at each node. In order to model real sequences, we also need to consider the possibility that gaps might occur when a model is aligned to a sequence. (an insertion or a deletion in the sequence). To handle these cases, each node in the profile HMM must now have three states: the match state, an insert state, and a delete state. The model also needs more types of transition probabilities: match->match, match->insert, match->delete, insert->match, etc. (http://bip.weizmann.ac.il/education/materials/gcg/hmmanalysis.html )