During the importation phase, T-LAB makes a
corpus segmentation into elementary contexts in order to
help user exploration and, above all, to make analyses that require
the co-occurrences computation.
to the user's choices, the elementary contexts can
Elementary contexts ending with punctuation marks (.? ! ), whose
length range is 50-1,000 characters.
2 - Chunks
Elementary contexts of comparable length made up of one or more
considers an elementary context to be every sequence of words
interrupted by full stop and carriage return, whose dimensions are
inferior to 400 characters;
- in the case where, within the maximum length, a full stop is
not present, it searches for other punctuation marks in the
following order (? ! ; : ,). If none are found, it performs
segmentation on the basis of a statistical criterion, but without
cutting the lexical units.
3 - Paragraphs
Elementary contexts ending with punctuation marks (.? ! ) and the
return key, whose maximum length is 2,000 characters.
4 - Short Texts
This option is enabled only when the maximum length of texts is
2,000 characters (e.g. responses to open-ended
- the corpus_segments.dat file contains the result of corpus
- In T-LAB, the
Concordances option allows the checking
of elementary contexts where each word (or lemma) is present.