*Thyne, Clayton *Intro to Stats, Chapter 2 examples clear set mem 80m cd [your directory] * NOTE: the above command asks you to change the directory. The directory will depend on how you've set up * the folders on your computer. For example, my command is... * cd F:\teaching\Fall_2008\PS572-401-baby_stats\do_file_directories\chapter2 * This is necessary so you know where stata is saving things. "box_trade.gph" in "Part 4", for example, * will be saved in the directory you specify. *PART 1: examining frequency using tables and graphs *replicates Wonnacott & Wonnacott (1990: 27) use http://www.uky.edu/~clthyn2/intro_stats/Ch2_example1.dta list tab days hist days, frequency width(1) xtick(1(1)13) title(TABLE 2-1 Absenteeism in the Shipping Department) xtitle(X = Number of days absent) ytitle(Frequency) hist days, norm frequency width(1) xtick(1(1)13) title(TABLE 2-1 Absenteeism in the Shipping Department) xtitle(X = Number of days absent) ytitle(Frequency) hist days, addlabel norm frequency width(1) xtick(1(1)13) title(TABLE 2-1 Absenteeism in the Shipping Department) xtitle(X = Number of days absent) ytitle(Frequency) twoway (kdensity days) (hist days) stem days *PART 2: group into centiles centile days, centile(25 50 75) *PART 3: examine with a boxplot graph box days, medtype(line) *PART 4: examine IR data clear use "http://www.uky.edu/~clthyn2/intro_stats/data_set_1.dta" replace ttrade=. if ttrade<0 graph box ttrade, saving(box_ttrade, replace) gen lttrade=log10(ttrade) graph box lttrade, saving(box_lttrade, replace) graph combine box_ttrade.gph box_lttrade.gph hist ttrade, frequency saving(ttrade.gph, replace) hist lttrade, frequency saving(lttrade.gph, replace) graph combine ttrade.gph lttrade.gph graph box lttrade, over(region) *PART 5: mean vs. median sum ttrade, detail **compare 50th percentile to mean sum lttrade, detail **compare 50th percentile to mean *we can also look at these together with... gen exists=1 table exists, c(n ttrade max ttrade min ttrade mean ttrade median ttrade) *PART 6: finding descriptive statistics clear use "http://www.uky.edu/~clthyn2/intro_stats/Ch2_example2.dta" sum *a more complex example with percentiles... clear use "http://www.uky.edu/~clthyn2/intro_stats/data_set_1.dta" sum cap, detail return list *PART 7: examining standard deviation clear use "http://www.uky.edu/~clthyn2/intro_stats/data_set_1.dta" gen lttrade=log10(ttrade) hist lttrade, frequency width(.1) xtick(1.5(.1)6) hist lttrade, frequency width(.1) xtick(1.5(.1)6) norm sum lttrade hist lttrade, xline(2.18 3.88 5.96) frequency width(.1) xtick(1.5(.1)6) norm di 3.88+.874 di 3.88-.874 hist lttrade, xline(2.18 3.88 5.96 4.754 3.006) frequency width(.1) xtick(1.5(.1)6) norm *PART 8: using relative frequencies *replicates Table 2-8 in WW page 53 clear use "http://www.uky.edu/~clthyn2/intro_stats/Ch2_example3.dta" sum expand frequency sort height frequency sum height **compare to values on page 53 *PART 9: assessing normality clear use "http://www.uky.edu/~clthyn2/intro_stats/data_set_1.dta" sum cap, detail **Is "cap" normal? **does mean=median=mode? **skewness stat... should be 0. If it is +- .8 or less, it is considered normal **kurtosis...Normal==3.0 tab cap *Part 9 (continued): using figures to look for normality quantile cap hist cap graph box cap *Part 9 (continued): clearly not normal, so what's the best transformation? gladder cap qladder cap **now, transform the variable, and check for normality gen lcap=log10(cap) sum lcap, detail quantile lcap hist lcap, normal graph box cap