| TF-IDF
 
 This measure, proposed by G. Salton (1989), allows
us to evaluate the weight of a term (lexical unit) within a
document (context unit).
 Its formula is the following:
 w i,j =
tf i,j x idf i (Term Frequency x Inverse
Document Frequency)
  
 Where:
 tf i,j = number
of occurrences of i (term) in j
(document)df i = number of documents
containing i
 N = total number of
documents
 Term Frequency (tf
i,j ) value can be normalized as follows: tf i,j
= tf i,j / Max
(f i,j )
 where Max (f i,j ) is the
maximum frequency of i(any term) in the j
(document).
 
 
   |