Short
Sample by Franco Lancia(franco.lancia@tlab.it)
March 14, 2007
On
February 10 (2007) I followed Senator Barack Obama's speech on TV where he announced
his candidacy for President of United States in Springfield.
Because
I was impressed by his ability in communicating his emotions and ideas, I decided
to spend some time in order to better understand his political vision.
On March
6, I downloaded all 77 speeches
of B. Obama available from the website http://obama.senate.gov/speech/
and I imported them by T-LAB.
Then,
in order to make a method of text analysis public, I decided to keep a record
of my steps and to publish a short report of my exploratory route.
Just
some short remarks about the preliminary treatments:
- an automatic lemmatization has been applied (e.g. the headword "hope"
stands for "hope", "hopes", hoping", "hoped");
- the most important multiword expressions have been detected and transformed
(e.g. "Civil Rights Movement" -> Civil_Rights_Movement");
- the elementary contexts for co-coccurrence analysis have been set as text
fragments of comparable length including one or more sentences;
- a wordlist of 1446 items (i.e. lemmas) with occurrence values equal or superior
to 8 has been used.
Firstly,
in order to get a initial representation of the data structure, I performed
a Correspondence
Analysis (*) of a word x speech contingency
table, that is of a matrix consisting of 1446 rows and 77 columns whose cells
contain frequency values.
(*) This
T-LAB tool is available in the sub-menu "Comparative Analysis".
The following
charts show the obtained results mapped on the first two factorial axes.
Figure
1 - The most significant keywords
Figure
2 - The 77 speeches
In
summary, the main topics of analysed speeches turn out organized in the following
way:
Figure
3 - A 4-way scheme
In
detail, a precise description of these main topics is provided by the following
T-LAB tables which include the first 30 words for
each pole, sorted by their decreasing test-value.
Figure
4 - Word Test Values
Some
comments:
The
first (horizontal) axis opposes "internal" and "external"
political topics (see "education" vs "war"), whereas the
second (vertical) axis - through the reference to climate changes - seems to
create a bridge between the two first polarities.
These
simplified results depend on the method of analysis in two respects:
a) the choice of analysis units, that is the construction of a table having
as many columns as there are different speeches;
b) the choice of a specific statistical tool, that is the correspondence analysis.
From
a methodological point of view, to analyse the same table (to be precise, its
transposed) I could have used a typical algorithm of document clustering (see
the T-LAB tool Thematic
Document Classification), but - as I had the chance to verify - the
results would be very similar. Starting from 77 speeches/documents, because
of the reduction process, the clusters obtained are 4, that is not enough for
producing an articulate thematic map.
Differently,
by changing the analysis units, that is by asking T-LAB
to build and analyse a table with as many rows as the elementary contexts (in
this case, more than three thousand), I obtained a map of 12 thematic clusters.
The logic
of this T-LAB tool (Thematic
Analysis of Elementary Contexts) is explained in the user's manual
and in the on-line help, both available from the T-LAB
web site (www.tlab.it). Briefly,
it combines two kinds of analysis: the first uses a clustering algorithm for
discovering thematic groups of elementary contexts sharing similar word co-occurrence
patterns; the second builds a contingency table words x clusters and maps its
structure by means of the correspondence analysis.
The following
charts show the relationships of the twelve clusters within the first bi-dimensional
space (Figure 5) and their relative weight (Figure 6).
Figure
5 - Scatter chart with 12 thematic clusters
Figure
6 - Histogram of 12 thematic clusters
In
some ways, the structure of the bi-dimensional spaces obtained by means of two
different tools are very similar (see Fig. 1 and Fig. 5). But, in last case
(Fig. 5), because each label stands for a cluster consisting of elementary contexts
sharing similar word patterns and T-LAB makes tables
with the characteristics of each cluster available, we can take a look at the
characteristics of each of them.
For example, the following three tables report the characteristics of as many
thematic clusters.
Figure
7 - Thematic cluster HEALTH-CARE
Click here to show the complete
output of this cluster
Figure
8 - Thematic cluster IRAQ WAR
Click here to show the complete
output of this cluster
Figure
9- Thematic cluster OIL & ENERGY
Click here to show the complete
output of this cluster
We
can summarize the "content" of the thematic clusters even using a
list (see below) which includes the two most significant elementary contexts
of each of them, i.e. the elementary contexts to which - within each cluster
- T-LAB assigned the highest scores.
In some ways, the following table seems to propose a summary of a political
manifesto.
FAMILY
When
a parent takes parental leave, we shouldn't act like caring for a newborn
baby is a three-month break - we should let them keep their salary . When
parents are working and their children need care, we should make sure
that care is affordable, and we should make sure our kids can go to school
earlier and longer so they have a safe place to learn while their parents
are at work
....
The amendment is simple: it says that the children of low-income working
parents affected by Hurricane Katrina will no longer be denied the child
credit . You work, your kids get a benefit . If you don't work, no benefit
. And if you want the full benefit , you have to earn at least $ 10,000,
which is just about the income of a full time job at minimum wage .
IRAQ
WAR
Drawing
down our troops in Iraq will allow us to redeploy additional troops to
Northern Iraq and elsewhere in the region as an over-the-horizon force
. This force could help prevent the conflict in Iraq from becoming a wider
war, consolidate gains in Northern Iraq, reassure allies in the Gulf ,
allow our troops to strike directly at al Qaeda wherever it may exist
...
....
this redeployment remains our best leverage to pressure the Iraqi government
to achieve the political settlement between its warring factions that
can slow the bloodshed and promote stability . My plan also allows for
a limited number of U.S. troops to remain and prevent Iraq from becoming
a haven for international terrorism and reduce the risk of all-out chaos.
OUR
COUNTRY
Our
economic dominance has depended on individual initiative and belief in
the free market; but it has also depended on our sense of mutual regard
for each other, the idea that everybody has a stake in the country, that
we're all in it together and everybody 's got a shot at opportunity And
so if we're serious about this opportunity
....
Yes , our greatness as a nation has depended on individual initiative,
on a belief in the free market . But it has also depended on our sense
of mutual regard for each other , the idea that everybody has a stake
in the country , that we're all in it together and everybody ' s got a
shot at opportunity . Robert Kennedy reminded us of this . He reminds
us still .
LAW
In
the Lochner case , and in a whole series of cases prior to Lochner being
overturned , the Supreme Court consistently overturned basic measures
like minimum wage laws, child labor safety laws , and rights to organize,
deeming those laws as somehow violating a constitutional right to private
property .
....
Let me just give you a couple examples . In a case reviewing California's
parental notification law, Justice Brown criticized the California Supreme
Court decision overturning that law, saying that the court should have
remained tentative, recognizing the primacy of legislative prerogatives.
SCHOOL
Schools
that raise student achievement would be given bonuses . For schools that
don't improve, the districts would close them and replace them with new,
smaller schools that can replicate some of the successful reforms taking
place elsewhere.
....
To hold schools and teachers accountable for the results of all
these reforms, Innovation Districts would be asked to support schools
that succeed and shut down those that don't . To find out what works and
what doesn't , we'd provide them with powerful data and technology, and
also give them the option of partnering with local universities to help
them improve performance..
PEOPLE
I
imagine that they would 've seen the marchers and heard the speeches,
but they also probably saw the dogs and the fire hoses, or the footage
of innocent people being beaten within an inch of their lives; or heard
the news the day those four little girls died when someone threw a bomb
into their church .
....
And in that movement, she saw women who were willing to walk instead
of ride the bus after a day of doing somebody else's laundry and looking
after somebody else's children because they walked for freedom . And she
saw young people of every race and every creed take a bus down to Mississippi
and Alabama to register voters because they believed .
WEAPONS
I
thank the managers of this bill , Senators McConnell and Leahy, and their
staffs for working with me on this important issue. I know that Senator
McConnell has a longstanding interest in Southeast Asia, and Senator Leahy
has always been a champion of international health issues, making the
avian flu something I know they both care deeply about.
....
So last November , we introduced an amendment to the tax reconciliation
bill expressing the Sense of the Senate that FEMA should immediately rebid
these contracts. Our colleagues agreed and passed this amendment by unanimous
consent . After our amendment passed, both Senator Coburn and I met with
Director Paulison , and again he assured us that these contracts would
be rebid .
YOUNG
STORIES
Even
people who didn't know me were skeptical of my decision. I remember having
a conversation with an older man I had met before I arrived in Chicago.
I told him about my plans, and he looked at me and said , "Let me
tell something . You look like a nice clean-cut young man, and you've
got a nice voice.
....
And yet , somehow , we're still hearing stories like the one I
heard from a veteran named Bill Allen , who told me that on a trip to
Chicago, he actually saw homeless veterans fighting over access to the
dumpsters . That 's what I thought about . And finally, I thought about
a young man named Seamus Ahern, who I met during the campaign at a V.
F .W .
CHALLENGE
Each
and every one of these challenges call for an America that'is more purposeful,
more grown-up than the America that we have today . An America that reflects
the lessons that have helped so many of its people mature in their own
lives. An America that 's about not just each of us, but all of us. An
America that takes great risks in the face of greater odds .
....
That 's as true today as it was then - the real job of organizing
working America politics and policy, vision and mission, heart and soul
- belongs to each of you . And if you have the courage to succeed, labor
will rise again. America will rise again. And hope will rise again. Thank
you and God Bless you.
TOMORROW
More
and more , Americans are competing for these jobs with highly educated
workers from India, China, and all over the world. If we want America
to win in this new global economy, we have to start sending more kids
to college, not less.
....
instant messaging with friends across the world - a quiet revolution
has been breaking down barriers and connecting the world 's economies.
Now, businesses not only have the ability to move jobs wherever there's
a factory, but wherever there's an internet connection.
HEALTH
CARE
But
by bringing our health care system on-line, we could start improving the
quality of care and cutting the cost of it . We could save thousands of
lives and save families billions of dollars. Just imagine if every doctor
and nurse could sit by a patient 's bedside with a laptop and pull up
their entire medical history - information from every past doctor they've
seen - ..
....
From the smallest mom and pop stores to major corporations like
GM , businesses who can't afford these rising costs are cutting back on
insurance, workers, or both. States with bigger Medicaid bills and smaller
budgets are being forced to choose whether they want their citizens to
be unhealthy or uneducated. And over half of all family bankruptcies today
are
OIL
& ENERGY
Recently
, I joined a few other Senators in introducing a bill that would increase
America's renewable fuel standard and increase ethanol production along
with it. A bill like this that 's already passed the Senate twice would
've provided us with 500,000 barrels a day of refined ethanol for use
in gasoline and would save us $ 4 billion every year in imported oil and
gasoline costs
....
The President 's energy proposal would reduce our oil imports by
4 .5 million barrels per day by 2025 . Not only can we do better than
that , we must do better than that if we hope to make a real dent in our
oil dependency . With technology we have on the shelves right now and
fuels we can grow right here in America , by 2025 we can reduce our oil
imports by over 7.5 .
Click
here to show a complete summary
of all twelve clusters.
At this
point, the main topics of Obama speeches are duly mapped, both those concerning
the development of human and technological resources, and those concerning the
security and the defence of people's rights. Looking
at figures 5 ad 6 we can notice that the majority of the topics, even if expressed
in a universal language, concern the domestic policy and are addressed to common
people.
Because
the thematic partition into twelve clusters can be saved, by using other T-LAB
tools we can investigate further relationships between and within them.
A first
kind of relationship, concerning the discourse transitions from a theme (i.e.
thematic cluster) to the others, can be explored by using a tool which performs
a Markovian analysis of the Sequences
of Themes.
Some
of its typical outputs are as follows:
Figure
10 - Adjacency Matrix of Thematic Clusters
Figure
11 - Predecessors and Successors of CHALLENGE theme
N.B.
In this table the "PROB" values indicate the probability of each theme
of coming before (predecessor) or after (successor) the selected item within
the discouse sequence.
By manipulating
these kinds of outputs and using other software, it is possible to produce graphs
like the following:
Figure
12 - Network of the main interconnections between all thematic clusters
Figure
13 - Interconnection between CHALLENGE and the
other thematic clusters.
A second
kind of relationship, concerning the word co-occurrences within each cluster,
can be explored by using the Word
Associations tool.
For example
we can compare the different contextual meaning of Obama's keyword "hope",
within the entire corpus and within some thematic clusters, also by extracting
some sentences with significant word co-occurrences.
N.B.
The enclosed tables report the association measures, i.e. the cosinus coefficients.
Here
are the word associations of Hope within the whole
corpus:
Here
are the word associations of Hope within the
CHALLENGE thematic cluster
Here are
the word associations of Hope within the
OIL & ENERGY thematic cluster
Here
are the word associations of Hope within the
IRAQ WAR thematic cluster
Here
are the word associations of Hope within the
OUR COUNTRY thematic cluster
Different
exploratory routes could be possible either
using other T-LAB tools or other software; but,
because I'm not a researcher in political science, I leave the job of a more
accurate analysis and of the data interpretation to more competent people.
However, on the basis of what I understood, as a world citizen I "hope"
that American people will have the courage to build a bridge towards the future
"dreamed" by Senator B. Obama.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
3rd Party Cookies
This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.
Keeping this cookie enabled helps us to improve our website.
Please enable Strictly Necessary Cookies first so that we can save your preferences!