T-LAB Home
T-LAB PLUS 2019 - ON-LINE HELP Prev Page Prev Page
What T-LAB does and what it enables us to do
Requirements and Performances
Corpus Preparation
Corpus Preparation
Structural Criteria
Formal Criteria
Import a single file...
Prepare a Corpus (Corpus Builder)
Open an existing project
Automatic and Customized Settings
Dictionary Building
Co-occurrence Analysis
Word Associations
Co-Word Analysis and Concept Mapping
Comparison between Word pairs
Sequence and Network Analysis
Thematic Analysis
Thematic Analysis of Elementary Contexts
Modeling of Emerging Themes
Thematic Document Classification
Dictionary-Based Classification
Key Contexts of Thematic Words
Comparative Analysis
Specificity Analysis
Correspondence Analysis
Multiple Correspondence Analysis
Cluster Analysis
Singular Value Decomposition
Lexical Tools
Text Screening / Disambiguations
Corpus Vocabulary
Stop-Word List
Multi-Word List
Word Segmentation
Other Tools
Variable Manager
Advanced Corpus Search
Contingency Tables
Analysis Unit
Association Indexes
Cluster Analysis
Context Unit
Corpus and Subsets
Correspondence Analysis
Data Table
Elementary Context
Frequency Threshold
Graph Maker
Key-Word (Key-Term)
Lexical Unit
Lexie and Lexicalization
Markov Chain
Naïve Bayes
Occurrences and Co-occurrences
Poles of Factors
Primary Document
Stop Word List
Test Value
Thematic Nucleus
Variables and Categories
Words and Lemmas

Formal Criteria

In the case of a corpus made up of a single text, and when the user doesn't resort to variables, there are no further operations required: it is possible to continue with the importation phase.

When, on the other hand, the corpus is made up of various text documents and/or categorical variables are used, the corpus preparation must be done by means of the Corpus Builder tool (see above) which, automatically, respects the following criteria:

Each text or subset of it (the "parts" defined by variables and/or IDnumber) is preceded by a coding line.

Each coding line has this format:

- It begins with a four asterisks string (****) followed by a blank space. T-LAB reads this string as: "here begins a user-defined text or a context unit".

- It goes on with the addition of strings made up by single asterisks and labels that define cases (IDnumber), variables and respective categories.

- It ends with the return key.

Here are some examples.

The following line introduces a text (or a corpus subset) codified with three variables - AGE, SEX and OCC (occupation) - and their categories (ADUL, FEM, PROF).



The following line introduces a text (or a corpus subset) codified with the same variables and the IDnumber label

**** *IDnumber_0001 *AGE_ADUL *SEX_FEM *OCC_PROF

The following line introduces a text (or a corpus subset) codified with two variables: YEAR, NEWSP.


In each coding line these T-LAB rules are observed:

1. Each label (IDnumber, variables and variable categories) cannot be spaced out by blank spaces;
2. Each label - both for variables and variable categories - cannot be longer than 15 characters (min. 2);
3. Each variable label must be linked to the respective category using an underscore ("_");
4. Between two different variables, that is before the next asterisk, a blank space must be inserted;
5. Each variable and respective category must be assigned for each corpus subset;
6. We can use a maximum of 50 variables, each allowing a max of 150 categories which can be compared;
7. The maximum IDnumbers is fixed at 99.999 for short texts (Max. 2,000 characters each, e.g. responses to open-ended questions, twitter messages, etc.) at 30,000 for the other cases