PART
IV: ANALYZING AND INTERPRETING QUANTITATIVE DATA
Chapter
11: Describing Quantitative Data
JUST
BECAUSE A STUDY HAS NUMBERS
DOESN'T
MEAN IT IS RIGHT; ALWAYS READ WITH A CRITICAL EYE.
"There
are three kinds of lies: lies, damn lies, and statistics."
"Stats" are numbers people
use to describe things.
IMPORTANT:
Realize that statistics are simply a SUMMARY of a distribution of scores.
We need to understand how
statistics are produced and what they mean in order to determine if they
are trustworthy.
We
attempt to account for the variation.
This
is our goal as quantiative social scientists.
The
conclusions are only as good as the data.
This
chapter provides several procedures and criteria to apply when drawing
conclusions from a set of data. There are two specific types of quantitative
data analysis, 1) descriptive
(which summarizes the information in the data), and 2) inferential(which
estimates the characteristics of a population from data gathered on a sample).
Inferential
data analysis also tests for significant DIFFERENCES
between groups and/or significant RELATIONSHIPS
between variables.
Statistics
is the field of science concerned with "the theories and techniques that
have been developed to manipulate data." Analyses are performed to understand
the "status" or characteristics of data. . .
I.
MAKING SENSE OF NUMBERS: STATISTICAL DATA ANALYSIS
Data analysis is
the process of examining what data mean to researchers. Statistical
data analysis refers to the process of examining what quantitative
data mean to researchers. Statistics refers to any numerical
indicator of a set of data.
To
be a competent and critical consumer of information, one needs to understand
the ways in which quantitative data are analyzed:
DESCRIPTION and
INFERENCE
Descriptive Statistical Data Analysis: used to construct simple
descriptions about the characteristics of a set of
quantitative data and to summarize the information in the data.
Inferential Data Analysis:
2 PURPOSES
Estimates
the characteristics of a population from data gathered on a sample.
Tests
for significant DIFFERENCES
between groups and significant RELATIONSHIPS
between variables.
II.
DESCRIBING DATA THROUGH SUMMARY STATISTICS
Summary
statistics provide an efficient way to describe an entire set of
quantitative data.
A. Measures of CENTRAL TENDENCY - describe the central point of
distribution (representative
values)
-
condenses
report of distribution of scores to a single numerical indicator called
a summary statistic. The measure
of central tendency describes the one score that best represents the entire
distribution, the most CHARACTERISTIC SCORE. The most characteristic
score is that which describes the center point of a distribution of scores.
-
MODE
(Mo) -
-
number
that occurs most frequently
-
APPROPRIATE
FOR NOMINAL DATA to describe which category occurs MOST?)
-
MEDIAN
(Md or Mdn) -
-
divides
the distribution exactly in half (at the 50th percentile)
-
literally,
the MIDDLE CASE
-
APPROPRIATE
FOR ORDINAL DATA
-
Not
swayed by Extreme Scores (OUTLIERS)
-
MEAN
(x bar; sample mean=M; population mean = mu) -
-
the
arithemtic average.
-
AVERAGE
SCORE IN A DISTRIBUTION.
-
APPROPRIATE
FOR INTERVAL/RATIO DATA
-
The
MOST SENSITIVE and MOST USEFUL measure of Central Tendency
B. Measures of DISPERSION - describe how scores differ;
how scores are SPREAD (measures of VARIABILITY)
Do not exist for nominal data because categories are used rather than meaningful
numbers.
-
RANGE
(or span) - number that reports the distance
between the highest and lowest scores in a distribution; describes how
widely data are spread out across a distribution; sensitive to extreme
scores; compensate by calculating interquartile
range (distance between the 25th and 75th
percentile points) which represents the range of scores for the middle
half of a distribution; generally used in combination with other measures
of dispersion.
-
VARIANCE
-
(sample
variance=s squared; population variance=sigma squared (standard deviation
squared)) number that represents the mathematical
index of the average distance of the scores on an interval or ratio
scale from the mean in squared units. A high variance means most
scores are far away from the mean...a low variance indicates most scores
cluster tightly about the mean. Expressed in SQUARED DEVIATIONS ABOUT
THE MEAN -- not as original units of measurements.
-
The
amount that one score differs from the mean is called its deviation score
(deviate)
-
Square
the deviation scores and the sum of all deviation scores in a sample is
called the SUM OF SQUARES
-
STANDARD
DEVIATION (SD; sample SD, s; population SD, standard deviation)
number that represents a summary statistic of how much scores vary from
the mean - expressed in the original units of measurement. Simple
calculation involves taking the square root of the variance. AVERAGE
AMOUNT OF DISPERSION WITHIN A DISTRIBUTION.
C. DESCRIBING DATA IN STANDARD SCORES (standard normal deviates)
-
the
number which represents how many standard deviation units a particular
score is above or below the mean. The most popular standard score
used by researchers is the z score.
-
The
z score is calculated by dividing the deviation score by the standard deviation
(X - X bar) divided by the standard deviation.
-
Each
z score indicates how many standard deviations that score is from the mean
of the distribution.
-
Raw
or unstandardized scores (z scores) veruses transformed standard scores
(Z scores)
-
Standard
scores allow comparisons between different people's scores on different
types of measurements.
III.
DESCRIBING DATA THROUGH VISUAL DISPLAYS
A. SIX COMMON TYPES
SHOWING DIFFERENCES BETWEEN GROUPS (Categories)
Especially useful for visually showing differences between nominal IV groups
with regard to a DV.
-
Frequency
Tables
-
Pie
Charts
-
Bar
Charts
-
Line
Graphs
SHOWING RELATIONSHIPS BETWEEN AN ORDERED INDEPENDENT
VARIABLE AND OTHER VARIABLES
Used when an independent vairable is measured using an interval or ratio
scale.
-
Frequency
Histograms
-
Frequency
Polygons
B. FREQUENCY Distributions
-
Counting and reporting HOW
OFTEN different categories or points on a measurement scale occur.
-
A list of the frequency of
responses for each category or measurement point is called a frequency
table.
-
Why
are Frequency counts of categories useful?
-
Inform
researchers about common communication practices of people and institutions.
-
Assess
predictions derived from theory
-
Show
changes over time
-
Frequency
Tables - a total number of times particular
values on a measurement scale occur in a data set.
-
Pie
Charts - illustrate the frequency counts of
categories.
-
Bar
Charts - visually illustrate frequency counts
for a nominal and ordinal variables.
-
Line
Graphs - use a single point to represent the
frequency count on a dependent variable for each of the groups.
-
Interval
and Ratio level variable frequencies are illustrated using...
-
Frequency
Histograms - distribution of frequencies where
blocks touch
Frequency
Polygons
- similar to line graphs except a
line connects the points representing the frequency count for each
point on the measurment scale rather than each category.
IV.
CONCLUSION
-
Statistics
are simply ways researchers attempt to describe the data they have acquired.
-
Once
important characteristics have been determined, it is possible to go beyond
description to infer conclusions about the data (relative to theory, research
questions, and research hypotheses).
-
We
will deal with HOW researchers draw conclusions from the data when we discuss
INFERENTIAL STATISTICS!
V.
IMPORTANT DEFINITIONS
Parameters
-
a characteristic of a population or a universe.
Statistic
-
the measurement of a sample with respect to a variable.
Nonparametric
Statistics -statistics used only to describe the characteristics
of a sample, without being able to generalize back to its population.
Parametric
Statistics - statistics used to estimate the characteristics of
a population based on the characteristics of a sample.
Data
Analysis
- the methods researchers use to
infer meaning from data, to determine what conclusions are justifed.
Standard
Scores
- provide a common unit of measurement
indicating how far away any particular score is from the mean. Used
to compare numbers(scores) from different distributions.
Parametric
Statistics - procedures which ESTIMATE the
parameters of a population based on the characteristics
(statistics) of a sample.
Nonparametric
Statistics - procedures that are used only
to DESCRIBE a SAMPLE.
VI.
STUDY QUESTIONS:
What
is the method researchers use to infer meaning from data and to determine
what conlucusions are justified?
What
is the difference between Descriptive Data Analysis and Inferential Data
Analysis?
What
is a frequency distribution?
What
are the five methods for graphically representing a frequency distribution?
When should each be used?
What
is a summary statistic?
What
are the three measures of central tendency?
What
are the three measures of dispersion?
What
are standard scores and why are they important to communication research?
What
are the two purposes of inferential statistics?
|