T-LAB 10.2 - ON-LINE HELP - T-LAB Tools for Text Analysis

In T-LAB an n-gram is a sequence of two (bi-gram) or more contiguous key words present within the same elementary context (i.e. sentence, text fragment or paragraph).

When used for computing word co-occurrences, n-gram segmentation overlooks both stop-words and punctuation marks.

Let's consider the following example:

The Citizens of each State shall be entitled to all Privileges and Immunities of Citizens in the several States.

By assuming that the seven items in red are included in our key-term list and that an automatic lemmatization has been applied, a bi-gram segmentation produces the following co-occurrence contexts:

citizen & state
state & entitle
entitle & privilege
privilege & immunity
immunity & citizen
citizen & state.

Differently, a three-gram segmentation produces the following co-occurrence contexts:

citizen & state & entitle
state & entitle & privilege
entitle & privilege & immunity
privilege & immunity & citizen
immunity & citizen & state
citizen & state.

It is worth recalling that, when segmenting texts into elementary contexts, co-occurrences depend on the presence (or absence) of key words; whereas, when using an n-gram segmentation, co-occurrences indicate a sequential relationship between words.
In T-LAB an n-gram based co-occurrence analysis can be performed with the advanced options of the Word Association tool; moreover, a Markovian analysis of bi-grams can be performed with the Sequence Analysis tool.