PART IV:  ANALYZING AND INTERPRETING QUANTITATIVE DATA

Chapter 11:  Describing Quantitative Data

JUST BECAUSE A STUDY HAS NUMBERS
DOESN'T MEAN IT IS RIGHT; ALWAYS READ WITH A CRITICAL EYE.

"There are three kinds of lies:  lies, damn lies, and statistics."

"Stats" are numbers people use to describe things.

IMPORTANT:  Realize that statistics are simply a SUMMARY of a distribution of scores.

We need to understand how statistics are produced and what they mean in order to determine if they are trustworthy.

We attempt to account for the variation.
This is our goal as quantiative social scientists.
The conclusions are only as good as the data.

This chapter provides several procedures and criteria to apply when drawing conclusions from a set of data.  There are two specific types of quantitative data analysis, 1) descriptive (which summarizes the information in the data), and 2) inferential(which estimates the characteristics of a population from data gathered on a sample). Inferential data analysis also tests for significant DIFFERENCES between groups and/or significant RELATIONSHIPS between variables.

Statistics is the field of science concerned with "the theories and techniques that have been developed to manipulate data." Analyses are performed to understand the "status" or characteristics of data. . .

I.     MAKING SENSE OF NUMBERS:  STATISTICAL DATA ANALYSIS
 

Data analysis is the process of examining what data mean to researchers.  Statistical data analysis refers to the process of examining what quantitative data mean to researchers.  Statistics refers to any numerical indicator of a set of data.

To be a competent and critical consumer of information, one needs to understand the ways in which quantitative data are analyzed:   DESCRIPTION and INFERENCE

     Descriptive Statistical Data Analysis:  used to construct simple descriptions about the characteristics of a set of 
     quantitative data and to summarize the information in the data.

     Inferential Data Analysis

           2 PURPOSES

  • Estimates the characteristics of a population from data gathered on a sample.
  • Tests for significant DIFFERENCES between groups and significant RELATIONSHIPS between variables.
  • II.     DESCRIBING DATA THROUGH SUMMARY STATISTICS

              Summary statistics provide an efficient way to describe an entire set of quantitative data.

         A.  Measures of CENTRAL TENDENCY - describe the central point of  distribution (representative values)

    • condenses report of distribution of scores to a single numerical indicator called a summary statistic.  The measure of central tendency describes the one score that best represents the entire distribution, the most CHARACTERISTIC SCORE.  The most characteristic score is that which describes the center point of a distribution of scores.
    • MODE (Mo) - 
      • number that occurs most frequently 
      • APPROPRIATE FOR NOMINAL DATA to describe which category occurs MOST?)
    • MEDIAN (Md or Mdn) - 
      • divides the distribution exactly in half (at the 50th percentile)
      • literally, the MIDDLE CASE
      • APPROPRIATE FOR ORDINAL DATA
      • Not swayed by Extreme Scores (OUTLIERS)
    • MEAN (x bar; sample mean=M; population mean = mu)
      • the arithemtic average. 
      • AVERAGE SCORE IN A DISTRIBUTION.
      • APPROPRIATE FOR INTERVAL/RATIO DATA
      • The MOST SENSITIVE and MOST USEFUL measure of Central Tendency
         B.    Measures of DISPERSION - describe how scores differ; how scores are SPREAD (measures of VARIABILITY)

                Do not exist for nominal data because categories are used rather than meaningful numbers.

    • RANGE (or span) - number that reports the distance between the highest and lowest scores in a distribution; describes how widely data are spread out across a distribution; sensitive to extreme scores; compensate by calculating interquartile range (distance between the 25th and 75th percentile points) which represents the range of scores for the middle half of a distribution; generally used in combination with other measures of dispersion.
    • VARIANCE - (sample variance=s squared; population variance=sigma squared (standard deviation squared)) number that represents the mathematical index of the average distance  of the scores on an interval or ratio scale from the mean in squared units.  A high variance means most scores are far away from the mean...a low variance indicates most scores cluster tightly about the mean.  Expressed in SQUARED DEVIATIONS ABOUT THE MEAN -- not as original units of measurements.
      • The amount that one score differs from the mean is called its deviation score (deviate)
      • Square the deviation scores and the sum of all deviation scores in a sample is called the SUM OF SQUARES
    • STANDARD DEVIATION (SD; sample SD, s; population SD, standard deviation) number that represents a summary statistic of how much scores vary from the mean - expressed in the original units of measurement.  Simple calculation involves taking the square root of the variance.  AVERAGE AMOUNT OF DISPERSION WITHIN A DISTRIBUTION.
          C.  DESCRIBING DATA IN STANDARD SCORES (standard normal deviates)
    • the number which represents how many standard deviation units a particular score is above or below the mean.  The most popular standard score used by researchers is the z score.
    • The z score is calculated by dividing the deviation score by the standard deviation (X - X bar) divided by the standard deviation.
    • Each z score indicates how many standard deviations that score is from the mean of the distribution.
    • Raw or unstandardized scores (z scores) veruses transformed standard scores (Z scores)
    • Standard scores allow comparisons between different people's scores on different types of measurements.
    III.  DESCRIBING DATA THROUGH VISUAL DISPLAYS 

         A. SIX COMMON TYPES
     
              SHOWING DIFFERENCES BETWEEN GROUPS (Categories)
              Especially useful for visually showing differences between nominal IV groups with regard to a DV.

    1. Frequency Tables
    2. Pie Charts
    3. Bar Charts
    4. Line Graphs
              SHOWING RELATIONSHIPS BETWEEN AN ORDERED INDEPENDENT VARIABLE AND OTHER VARIABLES
              Used when an independent vairable is measured using an interval or ratio scale.
    1. Frequency Histograms
    2. Frequency Polygons


         B. FREQUENCY Distributions

    • Counting and reporting HOW OFTEN different categories or points on a measurement scale occur.
    • A list of the frequency of responses for each category or measurement point is called a frequency table.
    • Why are Frequency counts of categories useful?
      • Inform researchers about common communication practices of people and institutions.
      • Assess predictions derived from theory
      • Show changes over time
        • Frequency Tables - a total number of times particular values on a measurement scale occur in a data set.
        • Pie Charts - illustrate the frequency counts of categories.
        • Bar Charts - visually illustrate frequency counts for a nominal and ordinal variables.
        • Line Graphs - use a single point to represent the frequency count on a dependent variable for each of the groups.
      • Interval and Ratio level variable frequencies are illustrated using...
        • Frequency Histograms - distribution of frequencies where blocks touch

        • Frequency Polygons - similar to line graphs except a line connects the points representing the frequency count for each point on the measurment scale rather than each category.
           
    IV.     CONCLUSION
    • Statistics are simply ways researchers attempt to describe the data they have acquired.
    • Once important characteristics have been determined, it is possible to go beyond description to infer conclusions about the data (relative to theory, research questions, and research hypotheses).
    • We will deal with HOW researchers draw conclusions from the data when we discuss INFERENTIAL STATISTICS!
    V.      IMPORTANT DEFINITIONS

    Parameters - a characteristic of a population or a universe.

    Statistic - the measurement of a sample with respect to a variable.

    Nonparametric Statistics -statistics used only to describe the characteristics of a sample, without being able to generalize back to its population.

    Parametric Statistics - statistics used to estimate the characteristics of a population based on the characteristics of a sample.

    Data Analysis - the methods researchers use to infer meaning from data, to determine what conclusions are justifed.

    Standard Scores - provide a common unit of measurement indicating how far away any particular score is from the mean.  Used to compare numbers(scores) from different distributions.

    Parametric Statistics - procedures which ESTIMATE the parameters of a population based on the characteristics (statistics) of a sample. 

    Nonparametric Statistics - procedures that are used only to DESCRIBE a SAMPLE.
     

    VI.   STUDY QUESTIONS:

    What is the method researchers use to infer meaning from data and to determine what conlucusions are justified?

    What is the difference between Descriptive Data Analysis and Inferential Data Analysis?

    What is a frequency distribution?

    What are the five methods for graphically representing a frequency distribution?  When should each be used?

    What is a summary statistic?

    What are the three measures of central tendency?

    What are the three measures of dispersion?

    What are standard scores and why are they important to communication research?

    What are the two purposes of  inferential statistics?