Frequency Threshold
During the pre-processing phase T-LAB computes
a minimum frequency threshold to select words (or lemmas) for the
automatic key-words list.
In any case, in order to guarantee the reliability
of all statistical computations, the minimum T-LAB
threshold is 4.
For this computation an algorithm
documented in one of the books in the Bibliography is used (Bolasco, 1999). It requires
the following steps:
- low frequency range detection; the
range (starting from the minimum frequency -"1") is defined by the
first "jump" in the growing occurrences values;
- threshold value choice. The threshold
value, according to corpus sizes, corresponds to the minimum value
in the first and in the second range decile (10% or
20%).
|