The following example has been realized by using
an old version of T-LAB (5.3).
All charts have been updated with T-LAB 8.0. Click here to find out more.
On February 10 (2007) I followed Senator Barack Obama's speech on TV where he announced his candidacy for President of United States in Springfield.
Because I was impressed by his ability in communicating his emotions and ideas, I decided to spend some time in order to better understand his political vision.
On March 6, I downloaded all 77 speeches of B. Obama available from the website http://obama.senate.gov/speech/ and I imported them by T-LAB.
Then, in order to make a method of text analysis public, I decided to keep a record of my steps and to publish a short report of my exploratory route.
some short remarks about the preliminary treatments:
- an automatic lemmatization has been applied (e.g. the headword "hope" stands for "hope", "hopes", hoping", "hoped");
- the most important multiword expressions have been detected and transformed (e.g. "Civil Rights Movement" -> Civil_Rights_Movement");
- the elementary contexts for co-coccurrence analysis have been set as text fragments of comparable length including one or more sentences;
- a wordlist of 1446 items (i.e. lemmas) with occurrence values equal or superior to 8 has been used.
Firstly, in order to get a initial representation of the data structure, I performed a Correspondence Analysis (*) of a word x speech contingency table, that is of a matrix consisting of 1446 rows and 77 columns whose cells contain frequency values.
(*) This T-LAB tool is available in the sub-menu "Comparative Analysis".
The following charts show the obtained results mapped on the first two factorial axes.
Figure 1 - The most significant keywords
Figure 2 - The 77 speeches
In summary, the main topics of analysed speeches turn out organized in the following way:
Figure 3 - A 4-way scheme
In detail, a precise description of these main topics is provided by the following T-LAB tables which include the first 30 words for each pole, sorted by their decreasing test-value.
Figure 4 - Word Test Values
The first (horizontal) axis opposes "internal" and "external" political topics (see "education" vs "war"), whereas the second (vertical) axis - through the reference to climate changes - seems to create a bridge between the two first polarities.
simplified results depend on the method of analysis in two respects:
a) the choice of analysis units, that is the construction of a table having as many columns as there are different speeches;
b) the choice of a specific statistical tool, that is the correspondence analysis.
From a methodological point of view, to analyse the same table (to be precise, its transposed) I could have used a typical algorithm of document clustering (see the T-LAB tool Thematic Document Classification), but - as I had the chance to verify - the results would be very similar. Starting from 77 speeches/documents, because of the reduction process, the clusters obtained are 4, that is not enough for producing an articulate thematic map.
Differently, by changing the analysis units, that is by asking T-LAB to build and analyse a table with as many rows as the elementary contexts (in this case, more than three thousand), I obtained a map of 12 thematic clusters.
The logic of this T-LAB tool (Thematic Analysis of Elementary Contexts) is explained in the user's manual and in the on-line help, both available from the T-LAB web site (www.tlab.it). Briefly, it combines two kinds of analysis: the first uses a clustering algorithm for discovering thematic groups of elementary contexts sharing similar word co-occurrence patterns; the second builds a contingency table words x clusters and maps its structure by means of the correspondence analysis.
The following charts show the relationships of the twelve clusters within the first bi-dimensional space (Figure 5) and their relative weight (Figure 6).
Figure 5 - Scatter chart with 12 thematic clusters
Figure 6 - Histogram of 12 thematic clusters
some ways, the structure of the bi-dimensional spaces obtained by means of two
different tools are very similar (see Fig. 1 and Fig. 5). But, in last case
(Fig. 5), because each label stands for a cluster consisting of elementary contexts
sharing similar word patterns and T-LAB makes tables
with the characteristics of each cluster available, we can take a look at the
characteristics of each of them.
For example, the following three tables report the characteristics of as many thematic clusters.
7 - Thematic cluster HEALTH-CARE
Click here to show the complete output of this cluster
8 - Thematic cluster IRAQ WAR
Click here to show the complete output of this cluster
9- Thematic cluster OIL & ENERGY
Click here to show the complete output of this cluster
can summarize the "content" of the thematic clusters even using a
list (see below) which includes the two most significant elementary contexts
of each of them, i.e. the elementary contexts to which - within each cluster
- T-LAB assigned the highest scores.
In some ways, the following table seems to propose a summary of a political manifesto.
a parent takes parental leave, we shouldn't act like caring for a newborn
baby is a three-month break - we should let them keep their salary . When
parents are working and their children need care, we should make sure
that care is affordable, and we should make sure our kids can go to school
earlier and longer so they have a safe place to learn while their parents
are at work
down our troops in Iraq will allow us to redeploy additional troops to
Northern Iraq and elsewhere in the region as an over-the-horizon force
. This force could help prevent the conflict in Iraq from becoming a wider
war, consolidate gains in Northern Iraq, reassure allies in the Gulf ,
allow our troops to strike directly at al Qaeda wherever it may exist
economic dominance has depended on individual initiative and belief in
the free market; but it has also depended on our sense of mutual regard
for each other, the idea that everybody has a stake in the country, that
we're all in it together and everybody 's got a shot at opportunity And
so if we're serious about this opportunity
the Lochner case , and in a whole series of cases prior to Lochner being
overturned , the Supreme Court consistently overturned basic measures
like minimum wage laws, child labor safety laws , and rights to organize,
deeming those laws as somehow violating a constitutional right to private
that raise student achievement would be given bonuses . For schools that
don't improve, the districts would close them and replace them with new,
smaller schools that can replicate some of the successful reforms taking
imagine that they would 've seen the marchers and heard the speeches,
but they also probably saw the dogs and the fire hoses, or the footage
of innocent people being beaten within an inch of their lives; or heard
the news the day those four little girls died when someone threw a bomb
into their church .
thank the managers of this bill , Senators McConnell and Leahy, and their
staffs for working with me on this important issue. I know that Senator
McConnell has a longstanding interest in Southeast Asia, and Senator Leahy
has always been a champion of international health issues, making the
avian flu something I know they both care deeply about.
people who didn't know me were skeptical of my decision. I remember having
a conversation with an older man I had met before I arrived in Chicago.
I told him about my plans, and he looked at me and said , "Let me
tell something . You look like a nice clean-cut young man, and you've
got a nice voice.
and every one of these challenges call for an America that'is more purposeful,
more grown-up than the America that we have today . An America that reflects
the lessons that have helped so many of its people mature in their own
lives. An America that 's about not just each of us, but all of us. An
America that takes great risks in the face of greater odds .
and more , Americans are competing for these jobs with highly educated
workers from India, China, and all over the world. If we want America
to win in this new global economy, we have to start sending more kids
to college, not less.
by bringing our health care system on-line, we could start improving the
quality of care and cutting the cost of it . We could save thousands of
lives and save families billions of dollars. Just imagine if every doctor
and nurse could sit by a patient 's bedside with a laptop and pull up
their entire medical history - information from every past doctor they've
seen - ..
|OIL & ENERGY||
, I joined a few other Senators in introducing a bill that would increase
America's renewable fuel standard and increase ethanol production along
with it. A bill like this that 's already passed the Senate twice would
've provided us with 500,000 barrels a day of refined ethanol for use
in gasoline and would save us $ 4 billion every year in imported oil and
here to show a complete summary
of all twelve clusters.
At this point, the main topics of Obama speeches are duly mapped, both those concerning the development of human and technological resources, and those concerning the security and the defence of people's rights. Looking at figures 5 ad 6 we can notice that the majority of the topics, even if expressed in a universal language, concern the domestic policy and are addressed to common people.
Because the thematic partition into twelve clusters can be saved, by using other T-LAB tools we can investigate further relationships between and within them.
A first kind of relationship, concerning the discourse transitions from a theme (i.e. thematic cluster) to the others, can be explored by using a tool which performs a Markovian analysis of the Sequences of Themes.
Some of its typical outputs are as follows:
Figure 10 - Adjacency Matrix of Thematic Clusters
Figure 11 - Predecessors and Successors of CHALLENGE theme
N.B. In this table the "PROB" values indicate the probability of each theme of coming before (predecessor) or after (successor) the selected item within the discouse sequence.
By manipulating these kinds of outputs and using other software, it is possible to produce graphs like the following:
Figure 12 - Network of the main interconnections between all thematic clusters
Figure 13 - Interconnection between CHALLENGE and the other thematic clusters.
A second kind of relationship, concerning the word co-occurrences within each cluster, can be explored by using the Word Associations tool.
For example we can compare the different contextual meaning of Obama's keyword "hope", within the entire corpus and within some thematic clusters, also by extracting some sentences with significant word co-occurrences.
N.B. The enclosed tables report the association measures, i.e. the cosinus coefficients.
are the word associations of Hope within the whole
Here are the word associations of Hope within the CHALLENGE thematic cluster
Here are the word associations of Hope within the OIL & ENERGY thematic cluster
Here are the word associations of Hope within the IRAQ WAR thematic cluster
Here are the word associations of Hope within the OUR COUNTRY thematic cluster
exploratory routes could be possible either
using other T-LAB tools or other software; but,
because I'm not a researcher in political science, I leave the job of a more
accurate analysis and of the data interpretation to more competent people.
However, on the basis of what I understood, as a world citizen I "hope" that American people will have the courage to build a bridge towards the future "dreamed" by Senator B. Obama.