T-LAB Home
T-LAB PLUS 2022 - ON-LINE HELP Prev Page Prev Page
What T-LAB does and what it enables us to do
Requirements and Performances
Corpus Preparation
Corpus Preparation
Structural Criteria
Formal Criteria
Import a single file...
Prepare a Corpus (Corpus Builder)
Open an existing project
Automatic and Customized Settings
Dictionary Building
Co-occurrence Analysis
Word Associations
Co-Word Analysis and Concept Mapping
Comparison between Word pairs
Sequence and Network Analysis
Thematic Analysis
Thematic Analysis of Elementary Contexts
Modeling of Emerging Themes
Thematic Document Classification
Dictionary-Based Classification
Texts and Discourses as Dynamic Systems
Comparative Analysis
Specificity Analysis
Correspondence Analysis
Multiple Correspondence Analysis
Cluster Analysis
Singular Value Decomposition
Lexical Tools
Text Screening / Disambiguations
Corpus Vocabulary
Stop-Word List
Multi-Word List
Word Segmentation
Other Tools
Variable Manager
Advanced Corpus Search
Classification of New Documents
Key Contexts of Thematic Words
Export Custom Tables
Import-Export Identifiers list
Analysis Unit
Association Indexes
Cluster Analysis
Context Unit
Corpus and Subsets
Correspondence Analysis
Data Table
Elementary Context
Frequency Threshold
Graph Maker
Key-Word (Key-Term)
Lexical Unit
Lexie and Lexicalization
Markov Chain
Naïve Bayes
Occurrences and Co-occurrences
Poles of Factors
Primary Document
Stop Word List
Test Value
Thematic Nucleus
Variables and Categories
Words and Lemmas

Formal Criteria

In the case of a corpus made up of a single text, and when the user doesn't resort to variables, there are no further operations required: it is possible to continue with the importation phase.

When, on the other hand, the corpus is made up of various text documents and/or categorical variables are used, the corpus preparation must be done by means of the Corpus Builder tool (see above) which, automatically, respects the following criteria:

Each text or subset of it (the "parts" defined by variables and/or IDnumber) is preceded by a coding line.

Each coding line has this format:

- It begins with a four asterisks string (****) followed by a blank space. T-LAB reads this string as: "here begins a user-defined text or a context unit".

- It goes on with the addition of strings made up by single asterisks and labels that define cases (IDnumber), variables and respective categories.

- It ends with the return key.

Here are some examples.

The following line introduces a text (or a corpus subset) codified with three variables - AGE, SEX and OCC (occupation) - and their categories (ADUL, FEM, PROF).



The following line introduces a text (or a corpus subset) codified with the same variables and the IDnumber label

**** *IDnumber_0001 *AGE_ADUL *SEX_FEM *OCC_PROF

The following line introduces a text (or a corpus subset) codified with two variables: YEAR, NEWSP.


In each coding line these T-LAB rules are observed:

1. Each label (IDnumber, variables and variable categories) cannot be spaced out by blank spaces;
2. Each label - both for variables and variable categories - cannot be longer than 25 characters (min. 2);
3. Each variable label must be linked to the respective category using an underscore ("_");
4. Between two different variables, that is before the next asterisk, a blank space must be inserted;
5. Each variable and respective category must be assigned for each corpus subset;
6. We can use a maximum of 50 variables, each allowing a max of 150 categories which can be compared;
7. The maximum IDnumbers is fixed at 99.999 for short texts (Max. 2,000 characters each, e.g. responses to open-ended questions, twitter messages, etc.) at 30,000 for the other cases