Words and Lemmas
Any text analysis software first of all identifies the so
called raw forms, that is the strings
of letters separated by blank spaces. Then, according either to
their specific algorithms or to the categories used by the
specialists, the software recognizes lexemes, key-words,
etc.
T-LAB tables, for all the lexical units
present in the corpus database, provide two types of
information:
· the first one, named "word",
contains the transcript of the lexical units (single words or
multi-words) as "strings" which are recognized by the
software;
· the second, named "lemma",
contains the labels (or tags) used for grouping and classifying the
lexical units.
According to the case, a lemma can
be:
- the result of the automatic
lemmatization process;
- an item of a "customized dictionary";
- a category grouping synonyms;
- a content analysis category;
- etc.
|