T-LAB Home
T-LAB PLUS 2019 - ON-LINE HELP Prev Page Prev Page
T-LAB
Introduction
What T-LAB does and what it enables us to do
Requirements and Performances
Corpus Preparation
Corpus Preparation
Structural Criteria
Formal Criteria
File
Import a single file...
Prepare a Corpus (Corpus Builder)
Open an existing project
Settings
Automatic and Customized Settings
Dictionary Building
Co-occurrence Analysis
Word Associations
Co-Word Analysis and Concept Mapping
Comparison between Word pairs
Sequence and Network Analysis
Concordances
Thematic Analysis
Thematic Analysis of Elementary Contexts
Modeling of Emerging Themes
Thematic Document Classification
Dictionary-Based Classification
Key Contexts of Thematic Words
Comparative Analysis
Specificity Analysis
Correspondence Analysis
Multiple Correspondence Analysis
Cluster Analysis
Singular Value Decomposition
Lexical Tools
Text Screening / Disambiguations
Corpus Vocabulary
Stop-Word List
Multi-Word List
Word Segmentation
Other Tools
Variable Manager
Advanced Corpus Search
Contingency Tables
Editor
Glossary
Analysis Unit
Association Indexes
Chi-Square
Cluster Analysis
Coding
Context Unit
Corpus and Subsets
Correspondence Analysis
Data Table
Disambiguation
Dictionary
Elementary Context
Frequency Threshold
Graph Maker
Homograph
IDnumber
Isotopy
Key-Word (Key-Term)
Lemmatization
Lexical Unit
Lexie and Lexicalization
Markov Chain
MDS
Multiwords
N-grams
Naïve Bayes
Normalization
Occurrences and Co-occurrences
Poles of Factors
Primary Document
Profile
Specificity
Stop Word List
Test Value
Thematic Nucleus
TF-IDF
Variables and Categories
Words and Lemmas
Bibliography
www.tlab.it

Singular Value Decomposition


The Singular Value Decomposition (SVD - see Wikipedia https://en.wikipedia.org/wiki/Singular-value_decomposition) is a technique for dimensionality reduction, which - in Text Mining - can be used for discovering the latent dimensions (or components) which determine semantic similarities between words (i.e. lexical units) or between documents (i.e. context units).

T-LAB allows us to perform a Singular Value Decomposition of three types of data tables. In the first case (see 'A' below), the data table is a co-occurrence matrix whose rows and columns are key-terms. In the second case (see 'B' below), a data table elementary contexts x key-terms will be filled with presence/absence values (i.e. '1' and '0'). In the third case (see 'C' below), a data table documents x key-terms will be filled with occurrence values.

The analysis procedure consists of the following steps:
1 - construction of the data table to be analysed (up to 300,000 rows x 5,000 columns);
2 - TF-IDF normalization and scaling of row vectors to unit length (Euclidean norm);
3 - extraction of first 20 'latent dimensions' through the Lanczos algorithm.
N.B.: In the case of co-occurrence matrix (see 'A' above), data normalization is performed through the cosine measure.

The analysis results are displayed in tables and charts.

In detail:

Two tables - the rows of which can be either lexical units or context units - have as many columns as the extracted dimensions (i.e. 20).

In the case of the LEMMAS (i.e. lexical units) table, a further column is displayed, in which the importance scores are reported (see below).

N.B.: The importance score of each lemma is computed by summing the absolute values of its first 20 coordinates (i.e. the eigenvectors), each one multiplied by its corresponding eigenvalue.

Any table can be sorted in ascending or descending order by clicking on any column header.
In order to export any table, just use the right click of the mouse when data are displayed.
Please note that, the first time such a table is exported, the Eigenvalues are also exported. This way the user is allowed to evaluate the relative weight of each dimension, that is the percentage of variance explained by each one of them.

By clicking the Associations button (see below), a further table is displayed with the similarity measures (i.e. cosine coefficients) of each word. Moreover, when any row of such a table is clicked, a graph is displayed with the corresponding data.

The main charts shows the relationships between the key-terms (i.e. lemmas) on the selected dimensions (see below).

By default, the above chart includes the 100 most important lemmas. However the user is allowed to customize both the number of lemmas and the chart characteristics.