www.tlab.it
Singular Value
Decomposition
The Singular Value Decomposition (SVD  see
Wikipedia https://en.wikipedia.org/wiki/Singularvalue_decomposition)
is a technique for dimensionality reduction, which  in Text Mining
 can be used for discovering the latent dimensions (or
components) which determine semantic similarities between
words (i.e. lexical units) or between documents (i.e. context
units).
TLAB allows us to
perform a Singular Value Decomposition of three types of
data tables. In the first case (see 'A' below), the data table
is a cooccurrence matrix whose rows and columns are keyterms. In
the second case (see 'B' below), a data table elementary contexts x
keyterms will be filled with presence/absence values (i.e. '1' and
'0'). In the third case (see 'C' below), a data table documents x
keyterms will be filled with occurrence values.
The analysis procedure consists of the following
steps:
1  construction of the data table to be analysed (up to 300,000
rows x 5,000 columns);
2  TFIDF normalization and scaling of row vectors to unit length
(Euclidean norm);
3  extraction of first 20 'latent dimensions' through the Lanczos
algorithm.
N.B.: In the case of cooccurrence matrix (see 'A' above), data
normalization is performed through the cosine measure.
The analysis results are displayed in tables and
charts.
In detail:
Two tables  the rows of which can be either lexical units
or context units  have as many columns as the extracted dimensions
(i.e. 20).
In the case of the LEMMAS (i.e. lexical units) table, a
further column is displayed, in which the importance scores are
reported (see below).
N.B.: The importance score of each lemma is
computed by summing the absolute values of its first 20 coordinates
(i.e. the eigenvectors), each one multiplied by its corresponding
eigenvalue.
Any table can be sorted in ascending or descending
order by clicking on any column header.
In order to export any table, just use the right click of
the mouse when data are displayed.
Please note that, the first time such a table is exported, the
Eigenvalues are also exported. This way the user is allowed to
evaluate the relative weight of each dimension, that is the
percentage of variance explained by each one of them.
By clicking the Associations button (see below), a
further table is displayed with the similarity measures (i.e.
cosine coefficients) of each word. Moreover, when any row of such a
table is clicked, a graph is displayed with the corresponding
data.
The main charts shows the relationships between
the keyterms (i.e. lemmas) on the selected dimensions (see
below).
By default, the above chart includes the 100 most
important lemmas. However the user is allowed to customize both the
number of lemmas and the chart characteristics.
