T-LAB Plus 2018
Listed below are 10 of the key improvements and new features:
1 - the Sequence and Network Analysis tool, which takes into account the positions of words relative to each other (i.e. transition probabilities), has been completely redesigned and now the user is able to check the relationships between the 'nodes' (i.e. the key-terms) of the text network at different levels:
in the one-to-one connections;
it is possible to check how the key-words have been grouped at each cluster
partition (see the below table).
Moreover it is possible to check the text segments which have the highest score of association with the clusters of the final partition.
For more information about the above functionalities, and for better understanding how they allow the easy exploration of all levels of the network hierarchy, click here.
2 - In the Comparison between Word Pairs sub-menu, a new radial diagram is now available which allows the user to quickly appreciate similarities and differences in word associations, either within the entire corpus or within a subset of it (see the image below).
3 - Most of the tools for co-occurrences analysis and thematic analysis can now work with key-term lists containing up to 5,000 items. So, when the automatic lemmatization is applied, this limit corresponds to about 12,000 words (i.e. raw forms).
4 - When analysing files which include Twitter messages, it is now possible to use strings with hashtags (i.e. the '#' character) as key-terms.
5 - When using any comparative tool based on contingency table analysis, T-LAB now creates heatmap tables with up to 10,000 rows which usually correspond to distinct key-terms (see the image below).
6 - When plotting the results of any Correspondence Analysis or Multidimensional Scaling, the scatter charts can show labels, the dimensions of which are based on the word frequency (see the images below).
7 - When performing a topic analysis or a thematic analysis of a corpus which includes variable attributes, it is now possible to build and analyse tables which cross the topics (or themes) and the attributes of each variable (see the image below).
8 - When using any thematic analysis tool a new output is now available which allows the user to quickly evaluate the most typical words of each 'topic' or of each 'thematic cluster' (see the image below).
-A new tool - named Corpus Advanced Search - is now available which
allows the user to extract all text segments (i.e. sentences or paragraphs)
which match single or multiple selection of words, either within the entire
corpus or within a subset of it.
10 -Some pre-processing steps, including the automatic lemmatization, have been improved.
T-LAB Plus 2017 was released on January 20th 2017.
The most important improvements concern: (A) the preprocessing steps - e.g. word segmentation, automatic lemmatization and stemming - for many languages, (B) the functionalities of some co-occurrence tools; (C) the performances of the Modeling of Emerging Themes tool.
A - Regarding the preprocessing steps, three new features have been implemented:
A.1-Word segmentation (see https://en.wikipedia.org/wiki/Text_segmentation) for Chinese and Japanese texts, which automatically delimits single words by white-spaces (see below).
N.B.: For the segmentation of the Chinese texts the 'Pan Gu Segment' library is used (http://pangusegment.codeplex.com/).
A.2-Dictionary-based lemmatization for nine (9) further languages;
A.3-Stemming algorithms for fifteen (15) languages;
N.B.: The main difference between (a) lemmatization and (b) stemming lies in how the inflectional forms of each word are normalized. In fact: (a) in the case of the lemmatization (see https://en.wikipedia.org/wiki/Lemmatisation ) the normalization consists in grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form (e.g.: 'arguing' -> 'argue'); (b) in the case of stemming (https://en.wikipedia.org/wiki/Stemming) , which usually simply removes inflectional endings, the stem need not be identical to the morphological root of the word (e.g.: 'arguing' -> 'argu').
Here is the list of the new languages for which the automatic lemmatization or the stemming process is supported by T-LAB Plus 2017.
LEMMATIZATION: Catalan, Croatian, Polish, Romanian, Russian, Serbian, Slovak, Swedish, Ukrainian.
STEMMING: Arabic, Bengali, Bulgarian, Czech, Danish, Dutch, Finnish, Greek, Hindi, Hungarian, Indonesian, Marathi, Norwegian, Persian, Turkish.
When selecting languages in the setup form, while the six languages(*) for which T-LAB already supported the automatic lemmatization can be selected trough the button on the left (see 'A' below), the new one can be selected trough the button on the right (see 'B' below).
(*) English, French, German, Italian, Portuguese and Spanish.
In any case, without automatic lemmatization and / or by using customized dictionaries the user can analyse texts in all languages, provided that words are separated by spaces and / or punctuation.
B - The new functionalities of the co-occurrence tools are listed below.
B.1 - More options are available in the setup form of for the Co-Word Analysis tool
When the 'automatic selection of key terms' is selected, different colours are used for different groups of items in the MDS map (see below);
Moreover, by right-clicking the chart area, a new option allows plotting the strongest links (i.e. those with the association index >0.15).
Finally, when the 'Hierarchical clustering of key- terms' is selected, it is possible to create dendrograms including the elements of each thematic nucleus (see below);
B.2 - When using the Word Associations tool a new option is available which automatically analyses any co-occurrence matrix with up to 3,000 rows and plots a MDS map with the most relevant key-words. This way the user can easily move from the analysis of 'one-to-one' relations to a 'all together' view (and viceversa), either within the entire corpus or within a part of it.
C - The performances of the Modeling of Emerging Themes tool, which uses a topic model algorithm, have been improved and now it allows one to analyse a collection of up to 30,000 documents, provided that the total number of word occurrences (i.e. tokens) doesn't exceed 3,000,000.
T-LAB Plus 2016 was released on April 22nd 2016.
Listed below are some of the key improvements and new features:
1 - Now eleven different file formats - including PDF documents - can be processed either as a single file or as a collection of documents.
2 - Two user profiles are now available: beginner and expert. When the first (i.e. beginner) is selected, the user is allowed to perform any analysis without being asked to choose between the advanced options.
3 - Whenever analysing word co-occurrences and/or exploring clustering solutions, a new tool named Graph Maker allows the user to easily create and export several new dynamic charts and graphs, some of which are built with the D3 library.
4 - Every time a tool for exploring similarities and differences between corpus subsets or between thematic clusters is used a new button is available which allows the user to view a preview by means of a dynamic tree map.
- An additional algorithm for the thematic analysis
of text segments and documents is now available which complements the
bisecting Kmeans algorithm implemented in T-LAB
more than ten years ago.
This way the user of T-LAB with an expert profile is able to compare different solutions for the same clustering problem, e.g. compare the quality of clusters obtained by two different algorithms applied to the same data tables.
Click here to consult the manual.
Click here to see the history of T-LAB's latest releases.