Modeling of Emerging
N.B.: The pictures shown in this section have been obtained
by using a previous version of T-LAB. These pictures look slightly different
in T-LAB Plus. Moreover: a) by
right clicking on the tables which
list the topics, additional options become available; b) a
new button allows the user to display
theme maps either through MDS
(Multidimensional Scaling) or through Correspondence Analysis; c) there is a new button
(TREE MAP PREVIEW) which allows the
user to create dynamic charts in HTML format; d) when analysing a
corpus which includes variable attributes, it is now possible to
build and analyse tables which cross the themes and the attributes
of each variable.
tool provides a simple way of discovering,
examining and modeling, the main themes or topics
(henceforward 'theme' and 'topic' will be used synonymously)
emerging from texts. Subsequently they
can be explored further with several tools, either by keeping
separate or by combining qualitative and quantitative
In fact, themes - which are described through their characteristic
vocabulary and consist of co-occurrence patterns of key-terms - can be used as categories in
further analyses or for automatically classifying the context units
(i.e. documents or elementary contexts).
All the information provided in this section refers to the default
option, i.e. to the bottom-up approach
which analyses word co-occurrences through probabilistic modeling;
in fact, when selecting the top-down
switches to the Dictionary- Based
Classification tool and to the corresponding help
The only parameter (see above) that the user can
set is the amount (i.e. a fixed number) of themes to be obtained.
Note that the higher this number is the more consistent are the
co-occurrence patterns; moreover, if necessary, some themes (e.g.
those that are redundant or difficult to interpret) can be
The analysis procedure
consists of the following steps:
a - construction of a co-occurrence matrix (depending on the cases,
either a document by word or a elementary context by word
b - data analysis by a probabilistic model which uses the Latent
Dirichlet Allocation and the Gibbs Sampling (see the related
information on Wikipedia: http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation;
c - description of themes by means of the probability of their
characteristic words, either "specific" or "shared" by two or more
On completion of the analysis you can easily perform the
1 - explore, rename and remove the characteristics of
2 - rename or discard specific themes;
3 - assess the semantic coherence of each theme;
4 - test the model by a Naïve Bayes Classifier
which assigns context units (i.e. documents and/or elementary
contexts) to themes;
5 - apply the model and visualize the relationships
between the different themes;
6 - export a dictionary of categories.
1 - Explore, rename and remove the
characteristics of each theme
In this chart (see above) "high probability" indicates a
By clicking on each theme label (see "A" above), tables
and charts can be visualized (see "B" above); moreover, by clicking
on words in the table (see "C" above), their distribution within
the various themes is displayed and a "remove" option is available.
The reading keys of the table are as follows:
IN THEME = tokens of each word within the selected theme;
TOT = total tokens of each word within the corpus (or the subset)
IN (%) = percentage values of each word within the selected
(p) = probability value of each word over themes;
TYPE = specific when the word belongs
to the selected theme only (i.e. p=1); shared in the other cases.
By selecting the complete
results option (see "B" above) a HTML file is created
including all themes and their characteristic vocabulary; moreover,
two XLS files can be saved.
When the "shared words" option is selected (see below) it is
possible to explore the corresponding table and create a chart for
each item selected.
2 - Rename or discard specific
In order or discard specific themes, just select one of them (see
"A" below) and click on the "rename/remove" button (see "B" below).
When the appropriate box appears, depending on your goals, you can
change the label by choosing among the available words or by typing
a new label in the appropriate field (see "C" below); otherwise you
can discard the selected theme just by clicking on the
corresponding button (see "D" below)
3 - Assess the semantic coherence of each
When clicking the Quality
Indices button (see the picture above), T-LAB computes the average similarity between
the top 10 words of each theme.
- the top 10 words are those with the highest probability values
- the average similarity is computed using the cosine index;
- the cosine index of each word pairs, like the Word Association tool, is computed at the text
segment (i.e. elementary context) level .
As a result, T-LAB creates a HTML table where the 'k' themes are
listed according to their 'semantic coherence' (i.e. the first
theme in the list is the one with the highest average similarity
N.B.: Because the above measures vary according to the selected
words, the user is advised to repeat the procedure each time any of
the top 10 words of each theme is removed.
4 - Test the
At the end of the analysis procedure (see above the "a" and "b"
points) each context unit (i.e. primary documents or elementary
contexts) is represented as mixture of different topics;
differently the Naïve Bayes Classifier
used in this step assigns each context unit to the topic which is
the most characteristic of it.
For this reason, when the "Test the
Model" option is selected, T-LAB creates a HTML file and two
XLS files including the classification of contexts units (see
5 - Apply the model
After having applied and
saved the model (see "A" below), the results of analysis can be
immediately visualised by a MDS map.
Moreover, given that after exiting from the analysis (see
"B" above) themes are recorded as clusters of context units (i.e.
like the Thematic Analysis of Elementary
Contexts and Thematic Classification of
Documents results), the new thematic variables just created
(i.e. CONT_CLUST and/or DOC_CLUST) can be explored by using various
tools (see below).
For example, you can perform a Correspondence Analysis of themes (see
produce a network map (see below) by using the Sequence of Themes tool
obtain Word Associations map
by using the corresponding T-LAB
tool (see below) and so on.
6 - Export a dictionary of
When this option is selected a dictionary file with the
.dictio extension is created which is
ready to be imported by any T-LAB tool for thematic analysis. In such a
dictionary each theme (or category) is described by its