Frequency Threshold
During the preprocessing phase TLAB computes
a minimum frequency threshold to select words (or lemmas) for the
automatic keywords list.
In any case, in order to guarantee the reliability
of all statistical computations, the minimum TLAB
threshold is 4.
For this computation an algorithm
documented in one of the books in the Bibliography is used (Bolasco, 1999). It requires
the following steps:
 low frequency range detection; the
range (starting from the minimum frequency "1") is defined by the
first "jump" in the growing occurrences values;
 threshold value choice. The threshold
value, according to corpus sizes, corresponds to the minimum value
in the first and in the second range decile (10% or
20%).
