www.tlab.it
Formal Criteria
In the case of a corpus made up
of a single text, and when the user
doesn't resort to variables, there are no further operations required: it is
possible to continue with the importation phase.
When, on the other hand, the corpus is made up of various
text documents and/or categorical variables are used, the corpus
preparation must be done by means of the Corpus Builder tool
(see above) which, automatically, respects the following
criteria:
Each text or subset of it (the "parts" defined by
variables and/or IDnumber) is preceded
by a coding line.
Each coding line has this
format:
- It begins with a
four asterisks string (****) followed
by a blank space. T-LAB reads this string as: "here
begins a user-defined text or a context unit".
- It goes on with the
addition of strings made up by single
asterisks and labels that define cases (IDnumber), variables
and respective categories.
- It ends with the return
key.
Here are some examples.
The following line introduces a text (or a corpus subset)
codified with three variables - AGE, SEX and OCC (occupation) - and
their categories (ADUL, FEM, PROF).
**** *AGE_ADUL *SEX_FEM *OCC_PROF
The following line introduces a text (or a corpus subset)
codified with the same variables and the IDnumber label
**** *IDnumber_0001 *AGE_ADUL *SEX_FEM
*OCC_PROF
The following line introduces a text (or a corpus subset)
codified with two variables: YEAR, NEWSP.
**** *YEAR_98 *NEWSP_TIMES
In each coding line these T-LAB rules
are observed:
1. Each label (IDnumber, variables and variable
categories) cannot be spaced out by blank spaces;
2. Each label - both for variables and variable categories - cannot
be longer than 25 characters (min. 2);
3. Each variable label must be linked to the respective category
using an underscore ("_");
4. Between two different variables, that is before the next
asterisk, a blank space must be inserted;
5. Each variable and respective category must be assigned for each
corpus subset;
6. We can use a maximum of 50 variables, each allowing a max of 150
categories which can be compared;
7. The maximum IDnumbers is fixed at 99.999 for short texts (Max.
2,000 characters each, e.g. responses to open-ended questions,
twitter messages, etc.) at 30,000 for the other
cases.
|