*Thyne, Clayton *Intro to Stats, Chapter 10.5 examples (stuff WW didn't cover) clear set mem 80m ****PART 1: crosstabs *Begin by generating the variables used as examples in the notes... use "http://www.uky.edu/~clthyn2/intro_stats/IR_example.dta" *generate nominal vars (colony & region) gen colony=1 if colbrit==1 & colbrit~=. replace colony=2 if colfr==1 & colfra~=. replace colony=3 if colony==. label var colony "1=UK; 2=Fra; 3=none label var region "1=eur; 2=M.E.; 3=Afr; 4=Asi; 5=amer *generate ordinal vars (polity_ord & wealth_ord) gen polity_ord=1 if polity2l<=-6 replace polity_ord=2 if polity2l>-6 & polity2l<6 replace polity_ord=3 if polity2l>=6 label var polity_ord "1=auth; 2=anoc; 3=dem gen wealth_ord=1 if lgdppcl<=3.09 replace wealth_ord=2 if lgdppcl>3.09 & lgdppcl<3.88 replace wealth_ord=3 if lgdppcl>=3.88 label var wealth_ord "1=poor; 2=mod; 3=rich *interval/continuous vars include polity2l & lgdppcl *drop vars we don't need keep ccode year colony region polity_ord wealth_ord polity2l lgdppcl save "IR_example_new_vars.dta", replace *So, crosstabs are just using tab followed by any 2 nominal/ordinal vars... tab colony region tab colony polity_ord ****Part 2: examining nominal/nominal associations *Q: Does knowing a country's region help explain whether or not it was a former colony? *First, we need to know if there's a significant relationship tab colony region, chi2 *Yes, the relationship is significant. Now let's see how strong the association is using lambda... *E1 = overall dist. - highest freq of DV tab colony di 129-64 *so E1 = 65 *E2 = overall dist. - (sum of highest freq of each IV or column) tab colony region1 di 129-(23+10+15+11+18) *so E2=52 *Lambda = (E1-E2)/E1 di (65-52)/65 *So lambda = .2, so we have a 20% reduction in error knowing the IV *We can also just use the stata command "lambda" *You'll probably need to install this. Use "net search lambda" and install from the first link. lambda colony region *the lambda_a value is what we'll use. I'm not sure what the others are ****To replicate the lambda class example... *First, generate data used in the class example clear set obs 2000 gen count=0 replace count=1 if count[_n-1]==. replace count=count[_n-1]+1 if count==0 & count[_n-1]~=0 gen candidate=1 if count<=1200 replace candidate=2 if count>1200 label var cand "1=clin; 2=giul gen gender=0 replace gender=1 if count<=900 replace gender=2 if count>900 & count<=1200 replace gender=1 if count>1200 & count<=1300 replace gender=2 if count>1300 label var gen "1=men; 2=women *Second, calculate lambda *E1 = overall dist. - highest freq of DV tab cand di 2000-1200 *so E1 = 800 *E2 = overall dist. - (sum of highest freq of each IV or column) tab cand gender di 2000 - (900 + 700) *so E2 = 400 *Lambda = (E1-E2)/E1 di (800-400)/800 *So lambda = .5, so we have a 50% reduction in error knowing the IV *you can also do this using the "lambda" command. You can download this from /// http://www.inomics.net/cgi/repec?handle=RePEc:boc:bocode:S435701 *use the lambda_a value for what we've done so far lambda cand gen ****Part 3: Cramer's V example clear use "http://www.uky.edu/~clthyn2/intro_stats/IR_example_new_vars.dta" tab colony region, V *compare to lambda lambda colony region ****Part 4: Lambda and Cramer's V with ordinal IVs lambda colony polity_ord tab colony polity_ord, V *****Part 5: ttest, oneway anova, 2+ way anova **ttest *we can only have 2 groups, so first we recode region to Europe=1, all others=2 gen region2=region1 replace region2=2 if region1~=1 *now we can run a t test or a very simple anova; we should get substantively identical results... ttest lgdppcl, by(region2) oneway lgdppcl region2 *unlike the t-test, we can extend oneway to deal with a 2+ category IV... oneway lgdppcl region1 *add "scheffe" to the end to see which regions are driving the differences... oneway lgdppcl region1, scheffe *using 2+ way anova, we can add in more categorical IVs... anova lgdppcl region1 anova lgdppcl region1 colony ****Part 6: gamma test for ordinal variables *this replicates the example we did in class; we add chi2 to make sure it's significant tab polity_ord wealth_ord, gamma chi2 ****Part 7: tau-b test for ordinal variables *this replicates the example for tau-b that we did in class ktau polity_ord wealth_ord, stats(taub)