n-gram 


n-gram search for term

In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items in question can be phonemes, syllables, letters, words or base pairs according to the application. n-grams are collected statistically from a text or speech corpus, and are used to predict the next most likely item in the sequence when a particular subsequence is found. An n-gram of size 1 is referred to as a unigram"; size 2 is a "bigram" (or\, less commonly\, a "digram"); size 3 is a "trigram"; size 4 is a "four-gram" and size 5 or more is simply called an "n-gram". Some language models built from n-grams are "(n - 1)-order Markov models". An n-gram model is a type of probabilistic model for predicting the next item in such a sequence. n-gram models are used in various areas of statistical natural language processing and genetic sequence analysis.(Martin 2011)