T-LAB Home
T-LAB 10.2 - ON-LINE HELP Prev Page Prev Page
T-LAB
Introduction
What T-LAB does and what it enables us to do
Requirements and Performances
Corpus Preparation
Corpus Preparation
Structural Criteria
Formal Criteria
File
Import a single file...
Prepare a Corpus (Corpus Builder)
Open an existing project
Settings
Automatic and Customized Settings
Dictionary Building
Co-occurrence Analysis
Word Associations
Co-Word Analysis and Concept Mapping
Comparison between Word pairs
Sequence and Network Analysis
Concordances
Co-occurrence Toolkit
Thematic Analysis
Thematic Analysis of Elementary Contexts
Modeling of Emerging Themes
Thematic Document Classification
Dictionary-Based Classification
Texts and Discourses as Dynamic Systems
Comparative Analysis
Specificity Analysis
Correspondence Analysis
Multiple Correspondence Analysis
Cluster Analysis
Singular Value Decomposition
Lexical Tools
Text Screening / Disambiguations
Corpus Vocabulary
Stop-Word List
Multi-Word List
Word Segmentation
Other Tools
Variable Manager
Advanced Corpus Search
Classification of New Documents
Key Contexts of Thematic Words
Export Custom Tables
Editor
Import-Export Identifiers list
Glossary
Analysis Unit
Association Indexes
Chi-Square
Cluster Analysis
Coding
Context Unit
Corpus and Subsets
Correspondence Analysis
Data Table
Disambiguation
Dictionary
Elementary Context
Frequency Threshold
Graph Maker
Homograph
IDnumber
Isotopy
Key-Word (Key-Term)
Lemmatization
Lexical Unit
Lexie and Lexicalization
Markov Chain
MDS
Multiwords
N-grams
Naïve Bayes
Normalization
Occurrences and Co-occurrences
Poles of Factors
Primary Document
Profile
Specificity
Stop Word List
Test Value
Thematic Nucleus
TF-IDF
Variables and Categories
Words and Lemmas
Bibliography
www.tlab.it

Corpus Vocabulary


This T-LAB tool allows us to check the Vocabulary of the corpus and its subsets (see option '1' below).
Moreover some measures of lexical richness are provided.

The Vocabulary table is a list including all distinct words (i.e. word types), the frequency of their occurrences (i.e. word tokens), their corresponding lemma (or label) and some categories used by T-LAB (see Glossary/Lemmatization).

The user can select (see option '2' below) the lexical units which belong to each category, view the corresponding table and save it as a .xls file (see option '3' below).

In addition, by right clicking any item, you can check its concordances (Key-Word-in-Context) (see option '4' below).

 

The measures of lexical richness are five:

Type/Token ratio (i.e. TTR);
Root TTR (Guiraud, 1960), obtained by dividing the number of types by the square root of the number of the tokens;
Corrected TTR (Carroll, 1964), obtained by dividing the number of types by the square root of twice the number of the tokens;
Log TTR (Herdan, 1960), obtained by dividing the logarithm of the number of types by the logarithm of the the number of the tokens;
Hapax/Types ratio.

N.B.:
- Hapax (i.e. Hapax Legomena) are words which occur only once in a corpus;
- When analysing a corpus subset, all measures of lexical richness do not include stop words (e.g. articles and prepositions).