T-LAB Home
T-LAB PLUS 2017 - ON-LINE HELP Prev Page Prev Page
T-LAB
Introduction
What T-LAB does and what it enables us to do
Requirements and Performances
Corpus Preparation
Corpus Preparation
Structural Criteria
Formal Criteria
File
Import a single file...
Prepare a Corpus (Corpus Builder)
Open an existing project
Settings
Automatic and Customized Settings
Dictionary Building
Co-occurrence Analysis
Word Associations
Co-Word Analysis and Concept Mapping
Comparison between Word pairs
Sequence Analysis
Concordances
Thematic Analysis
Thematic Analysis of Elementary Contexts
Thematic Document Classification
Dictionary-Based Classification
Modeling of Emerging Themes
Key Contexts of Thematic Words
Comparative Analysis
Specificity Analysis
Correspondence Analysis
Multiple Correspondence Analysis
Cluster Analysis
Contingency Tables
Lexical Tools
Text Screening / Disambiguations
Corpus Vocabulary
Stop-Word List
Multi-Word List
Word Segmentation
Other Tools
Variable Manager
Create a Sub-Corpus
Editor
Glossary
Analysis Unit
Association Indexes
Chi-Square
Cluster Analysis
Coding
Context Unit
Corpus and Subsets
Correspondence Analysis
Data Table
Disambiguation
Dictionary
Elementary Context
Frequency Threshold
Graph Maker
Homograph
IDnumber
Isotopy
Key-Word (Key-Term)
Lemmatization
Lexical Unit
Lexie and Lexicalization
Markov Chain
MDS
Multiwords
N-grams
Naïve Bayes
Normalization
Occurrences and Co-occurrences
Poles of Factors
Primary Document
Profile
Specificity
Stop Word List
Test Value
Thematic Nucleus
TF-IDF
Variables and Categories
Words and Lemmas
Bibliography
www.tlab.it

Modeling of Emerging Themes


N.B.: The pictures shown in this section have been obtained by using a previous version of T-LAB. These pictures look slightly different in T-LAB Plus. Moreover: a) by right clicking on the tables which list the topics, additional options become available; b) a new button allows the user to display theme maps either through MDS (Multidimensional Scaling) or through Correspondence Analysis; c) there is a new button (TREE MAP PREVIEW) which allows the user to create dynamic charts in HTML format.

This T-LAB tool provides a simple way of discovering, examining and modeling, the main themes or topics (henceforward 'theme' and 'topic' will be used synonymously) emerging from texts. Subsequently they can be explored further with several tools, either by keeping separate or by combining qualitative and quantitative approaches.

In fact, themes - which are described through their characteristic vocabulary and consist of co-occurrence patterns of key-terms - can be used as categories in further analyses or for automatically classifying the context units (i.e. documents or elementary contexts).

All the information provided in this section refers to the default option, i.e. to the bottom-up approach which analyses word co-occurrences through probabilistic modeling; in fact, when selecting the top-down option, T-LAB switches to the Dictionary- Based Classification tool and to the corresponding help section.


The only parameter (see above) that the user can set is the amount (i.e. a fixed number) of themes to be obtained. Note that the higher this number is the more consistent are the co-occurrence patterns; moreover, if necessary, some themes (e.g. those that are redundant or difficult to interpret) can be discarded later.

The analysis procedure consists of the following steps:

a - construction of a co-occurrence matrix (depending on the cases, either a document by word or a elementary context by word matrix);
b - data analysis by a probabilistic model which uses the Latent Dirichlet Allocation and the Gibbs Sampling (see the related information on Wikipedia: http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation; http://en.wikipedia.org/wiki/Gibbs_sampling;
c - description of themes by means of the probability of their characteristic words, either "specific" or "shared" by two or more themes.

On completion of the analysis you can easily perform the following operations:

1 - explore, rename and remove the characteristics of each theme;

2 - rename or discard specific themes;

3 - assess the semantic coherence of each theme;

4 - test the model by a Naïve Bayes Classifier which assigns context units (i.e. documents and/or elementary contexts) to themes;

5 - apply the model and visualize the relationships between the different themes;

6 - export a dictionary of categories.


In detail:

1 - Explore, rename and remove the characteristics of each theme

In this chart (see above) "high probability" indicates a probability >=.75.

By clicking on each theme label (see "A" above), tables and charts can be visualized (see "B" above); moreover, by clicking on words in the table (see "C" above), their distribution within the various themes is displayed and a "remove" option is available.


The reading keys of the table are as follows:
IN THEME = tokens of each word within the selected theme;
TOT = total tokens of each word within the corpus (or the subset) analysed;
IN (%) = percentage values of each word within the selected theme;
(p) = probability value of each word over themes;
TYPE = specific when the word belongs to the selected theme only (i.e. p=1); shared in the other cases.

By selecting the complete results option (see "B" above) a HTML file is created including all themes and their characteristic vocabulary; moreover, two XLS files can be saved.


When the "shared words" option is selected (see below) it is possible to explore the corresponding table and create a chart for each item selected.


2 - Rename or discard specific themes

In order or discard specific themes, just select one of them (see "A" below) and click on the "rename/remove" button (see "B" below).

When the appropriate box appears, depending on your goals, you can change the label by choosing among the available words or by typing a new label in the appropriate field (see "C" below); otherwise you can discard the selected theme just by clicking on the corresponding button (see "D" below)



3 - Assess the semantic coherence of each theme

When clicking the Quality Indices button (see the picture above), T-LAB computes the average similarity between the top 10 words of each theme.
More specifically:
- the top 10 words are those with the highest probability values over themes;
- the average similarity is computed using the cosine index;
- the cosine index of each word pairs, like the Word Association tool, is computed at the text segment (i.e. elementary context) level .
As a result, T-LAB creates a HTML table where the 'k' themes are listed according to their 'semantic coherence' (i.e. the first theme in the list is the one with the highest average similarity index).
N.B.: Because the above measures vary according to the selected words, the user is advised to repeat the procedure each time any of the top 10 words of each theme is removed.

4 - Test the Model

At the end of the analysis procedure (see above the "a" and "b" points) each context unit (i.e. primary documents or elementary contexts) is represented as mixture of different topics; differently the Naïve Bayes Classifier used in this step assigns each context unit to the topic which is the most characteristic of it.
For this reason, when the "Test the Model" option is selected, T-LAB creates a HTML file and two XLS files including the classification of contexts units (see below).


5 - Apply the model


After having applied and saved the model (see "A" below), the results of analysis can be immediately visualised by a MDS map.


Moreover, given that after exiting from the analysis (see "B" above) themes are recorded as clusters of context units (i.e. like the Thematic Analysis of Elementary Contexts and Thematic Classification of Documents results), the new thematic variables just created (i.e. CONT_CLUST and/or DOC_CLUST) can be explored by using various T-LAB tools (see below).

For example, you can perform a Correspondence Analysis of themes (see below)


produce a network map (see below) by using the Sequence of Themes tool


obtain Word Associations map by using the corresponding T-LAB tool (see below) and so on.

 

6 - Export a dictionary of categories

When this option is selected a dictionary file with the .dictio extension is created which is ready to be imported by any T-LAB tool for thematic analysis. In such a dictionary each theme (or category) is described by its characteristic words.