T-LAB 10.2 - ON-LINE HELP - Cluster Analysis

T-LAB 10.2 - ON-LINE HELP

T-LAB

Introduction

What T-LAB does and what it enables us to do

Requirements and Performances

Corpus Preparation

Corpus Preparation

Structural Criteria

Formal Criteria

File

Import a single file...

Prepare a Corpus (Corpus Builder)

Open an existing project

Settings

Automatic and Customized Settings

Dictionary Building

Co-occurrence Analysis

Word Associations

Co-Word Analysis and Concept Mapping

Comparison between Word pairs

Sequence and Network Analysis

Co-occurrence Toolkit

Thematic Analysis

Thematic Analysis of Elementary Contexts

Modeling of Emerging Themes

Thematic Document Classification

Dictionary-Based Classification

Texts and Discourses as Dynamic Systems

Comparative Analysis

Specificity Analysis

Correspondence Analysis

Multiple Correspondence Analysis

Cluster Analysis

Singular Value Decomposition

Lexical Tools

Text Screening / Disambiguations

Corpus Vocabulary

Multi-Word List

Word Segmentation

Other Tools

Variable Manager

Advanced Corpus Search

Classification of New Documents

Key Contexts of Thematic Words

Export Custom Tables

Import-Export Identifiers list

Glossary

Association Indexes

Cluster Analysis

Corpus and Subsets

Correspondence Analysis

Elementary Context

Frequency Threshold

Key-Word (Key-Term)

Lexie and Lexicalization

Occurrences and Co-occurrences

Poles of Factors

Primary Document

Thematic Nucleus

Variables and Categories

Words and Lemmas

www.tlab.it

Cluster Analysis

Cluster analysis is a set of statistical techniques the aim of which is to detect groups of objects with two complementary features:

A - High internal (within cluster) homogeneity;

B - High external (between cluster) heterogeneity.

In statistical language, the characteristics "A" and "B" respectively correspond to the within and between cluster variance.

In general, there are two kinds of Cluster Analysis techniques:

Hierarchical methods, whose algorithms rebuild the whole hierarchy of the objects under analysis (the so called "tree"), whether in an ascending order or in a descending order;

Partitioning methods, where the user defines beforehand the cluster numbers in which the set of objects under analysis is divided.

T-LAB uses both types of algorithms.

In particular:

· the Co-Word Analysis option uses a hierarchical method;
· the Cluster Analysis option allows the use of three different methods: two hierarchical and one partitioning;
· the Thematic Analysis of Elementary Contexts and Thematic Document Classification options use a bisecting K-means algorithm .

Some of the publications quoted in the Bibliography provide further information on the general aspects of the various methods (Bolasco S., 1999; Lebart L., A. Morineau, M. Piron, 1995), the specific aspects relating to the Hdbscan (Campello R. J. G. B., Moulavi D., Zimek A. & Sander J. , 2015) and the bisecting K-means method (Steinbach, M., G. Karypis, V. Kumar, 2000; Savaresi S.M., D.L. Boley, 2001).