www.tlab.it
Elementary Contexts
During the importation phase, T-LAB makes a
corpus segmentation into elementary contexts in order to
help user exploration and, above all, to make analyses that require
the co-occurrences computation.
According
to the user's choices, the elementary contexts can
be: 1 -
Sentences
Elementary contexts ending with punctuation marks (.? ! ), whose
length range is 50-1,000 characters.
2 - Chunks
Elementary contexts of comparable length made up of one or more
sentences.
More precisely:
- T-LAB
considers an elementary context to be every sequence of words
interrupted by full stop and carriage return, whose dimensions are
inferior to 400 characters;
- in the case where, within the maximum length, a full stop is
not present, it searches for other punctuation marks in the
following order (? ! ; : ,). If none are found, it performs
segmentation on the basis of a statistical criterion, but without
cutting the lexical units.
3 - Paragraphs
Elementary contexts ending with punctuation marks (.? ! ) and the
return key, whose maximum length is 2,000 characters.
4 - Short Texts
This option is enabled only when the maximum length of texts is
2,000 characters (e.g. responses to open-ended
questions).
N.B.:
- the corpus_segments.dat file contains the result of corpus
segmentation;
- In T-LAB, the
Concordances option allows the checking
of elementary contexts where each word (or lemma) is present.
|