www.tlab.it
Analysis Unit
The analysis units used in T-LAB are of two types: lexical units and context
units.
A. - the lexical units are words and multi-words, filed
and classified on the basis of a criterion. More precisely, in the
T-LAB database each lexical
unit consists in a classified record with two fields: word and lemma. In the first field
("word"), the words are listed as they
appear in the corpus, while in the second ("lemma") the labels attributed to lexical unit
groups are listed and classified according to linguistic criteria
(eg. Lemmatization) or by dictionaries
and semantic grids defined by the user.
B. - the context units
are portions of text that the corpus can be divided into. More
precisely, according to T-LAB
logic, there can be three types of context units:
B.1 primary documents, which correspond to the
"natural" subdivision of the corpus (eg. interviews, articles,
answers to open-ended questions, etc.), that is the initial context defined by the user;
B.2 elementary contexts, which correspond to
syntagmatic units (i.e. fragments, sentences, paragraphs) in which
each primary document can be subdivided;
B.3 corpus subsets, which correspond to groups of
primary documents which lead to the same "category" (eg. interviews
with "men" or "women", articles in a specific year or a particular
magazine and so on).
|