This website doesn't use cookies (More info)
email
en es fr it

PRODUCT / News (trial, release history)

T-LAB Plus 2017 was released on January 20th 2017.

The most important improvements concern: (A) the preprocessing steps - e.g. word segmentation, automatic lemmatization and stemming - for many languages, (B) the functionalities of some co-occurrence tools; (C) the performances of the Modeling of Emerging Themes tool.

A - Regarding the preprocessing steps, three new features have been implemented:

A.1-Word segmentation (see https://en.wikipedia.org/wiki/Text_segmentation) for Chinese and Japanese texts, which automatically delimits single words by white-spaces (see below).

N.B.: For the segmentation of the Chinese texts the 'Pan Gu Segment' library is used (http://pangusegment.codeplex.com/).

A.2-Dictionary-based lemmatization for nine (9) further languages;

A.3-Stemming algorithms for fifteen (15) languages;

N.B.: The main difference between (a) lemmatization and (b) stemming lies in how the inflectional forms of each word are normalized. In fact: (a) in the case of the lemmatization (see https://en.wikipedia.org/wiki/Lemmatisation ) the normalization consists in grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form (e.g.: 'arguing' -> 'argue'); (b) in the case of stemming (https://en.wikipedia.org/wiki/Stemming) , which usually simply removes inflectional endings, the stem need not be identical to the morphological root of the word (e.g.: 'arguing' -> 'argu').

Here is the list of the new languages for which the automatic lemmatization or the stemming process is supported by T-LAB Plus 2017.

LEMMATIZATION: Catalan, Croatian, Polish, Romanian, Russian, Serbian, Slovak, Swedish, Ukrainian.

STEMMING: Arabic, Bengali, Bulgarian, Czech, Danish, Dutch, Finnish, Greek, Hindi, Hungarian, Indonesian, Marathi, Norwegian, Persian, Turkish.

When selecting languages in the setup form, while the six languages(*) for which T-LAB already supported the automatic lemmatization can be selected trough the button on the left (see 'A' below), the new one can be selected trough the button on the right (see 'B' below).

(*) English, French, German, Italian, Portuguese and Spanish.

In any case, without automatic lemmatization and / or by using customized dictionaries the user can analyse texts in all languages, provided that words are separated by spaces and / or punctuation.

B - The new functionalities of the co-occurrence tools are listed below.

B.1 - More options are available in the setup form of for the Co-Word Analysis tool

When the 'automatic selection of key terms' is selected, different colours are used for different groups of items in the MDS map (see below);

Moreover, by right-clicking the chart area, a new option allows plotting the strongest links (i.e. those with the association index >0.15).

Finally, when the 'Hierarchical clustering of key- terms' is selected, it is possible to create dendrograms including the elements of each thematic nucleus (see below);

B.2 - When using the Word Associations tool a new option is available which automatically analyses any co-occurrence matrix with up to 3,000 rows and plots a MDS map with the most relevant key-words. This way the user can easily move from the analysis of 'one-to-one' relations to a 'all together' view (and viceversa), either within the entire corpus or within a part of it.

C - The performances of the Modeling of Emerging Themes tool, which uses a topic model algorithm, have been improved and now it allows one to analyse a collection of up to 30,000 documents, provided that the total number of word occurrences (i.e. tokens) doesn't exceed 3,000,000.



T-LAB Plus 2016 was released on April 22nd 2016.

Listed below are some of the key improvements and new features:

1 - Now eleven different file formats - including PDF documents - can be processed either as a single file or as a collection of documents.

N.B.:
- The image-only PDF files must be processed using OCR software first;
- HTML files can be imported by using the Corpus Builder module only.

2 - Two user profiles are now available: beginner and expert. When the first (i.e. beginner) is selected, the user is allowed to perform any analysis without being asked to choose between the advanced options.

3 - Whenever analysing word co-occurrences and/or exploring clustering solutions, a new tool named Graph Maker allows the user to easily create and export several new dynamic charts and graphs, some of which are built with the D3 library.

4 - Every time a tool for exploring similarities and differences between corpus subsets or between thematic clusters is used a new button is available which allows the user to view a preview by means of a dynamic tree map.

5 - An additional algorithm for the thematic analysis (i.e. unsupervised clustering) of text segments and documents is now available which complements the bisecting Kmeans algorithm implemented in T-LAB more than ten years ago.
The new algorithm uses the PDDP (i.e. Principal Direction Divisive Partitioning) method proposed by Daniel Boley (1998) to initialize a k-means clustering procedure.

This way the user of T-LAB with an expert profile is able to compare different solutions for the same clustering problem, e.g. compare the quality of clusters obtained by two different algorithms applied to the same data tables.

Click here to consult the manual.

Click here to see the history of T-LAB's latest releases.