*Thyne, Clayton
*Intro to Stats, Chapter 10.5 examples (stuff WW didn't cover)

	clear
	set mem 80m

****PART 1: crosstabs
*Begin by generating the variables used as examples in the notes...
	use "http://www.uky.edu/~clthyn2/intro_stats/IR_example.dta"
*generate nominal vars (colony & region)
	gen colony=1 if colbrit==1 & colbrit~=.
	replace colony=2 if colfr==1 & colfra~=.
	replace colony=3 if colony==.
	label var colony "1=UK; 2=Fra; 3=none
	label var region "1=eur; 2=M.E.; 3=Afr; 4=Asi; 5=amer
*generate ordinal vars (polity_ord & wealth_ord)
	gen polity_ord=1 if polity2l<=-6
	replace polity_ord=2 if polity2l>-6 & polity2l<6
	replace polity_ord=3 if polity2l>=6
	label var polity_ord "1=auth; 2=anoc; 3=dem
	gen wealth_ord=1 if lgdppcl<=3.09
	replace wealth_ord=2 if lgdppcl>3.09 & lgdppcl<3.88
	replace wealth_ord=3 if lgdppcl>=3.88
	label var wealth_ord "1=poor; 2=mod; 3=rich
*interval/continuous vars include polity2l & lgdppcl
*drop vars we don't need
	keep ccode year colony region polity_ord wealth_ord polity2l lgdppcl
	save "IR_example_new_vars.dta", replace

*So, crosstabs are just using tab followed by any 2 nominal/ordinal vars...
	tab colony region
	tab colony polity_ord

****Part 2: examining nominal/nominal associations
*Q: Does knowing a country's region help explain whether or not it was a former colony?
*First, we need to know if there's a significant relationship
	tab colony region, chi2
	*Yes, the relationship is significant.  Now let's see how strong the association is using lambda...

*E1 = overall dist. - highest freq of DV
	tab colony
	di 129-64
	*so E1 = 65

*E2 = overall dist. - (sum of highest freq of each IV or column)
	tab colony region1
	di 129-(23+10+15+11+18)
	*so E2=52

*Lambda = (E1-E2)/E1
	di (65-52)/65
	*So lambda = .2, so we have a 20% reduction in error knowing the IV

*We can also just use the stata command "lambda"
*You'll probably need to install this.  Use "net search lambda" and install from the first link.
	lambda colony region
	*the lambda_a value is what we'll use.  I'm not sure what the others are

****To replicate the lambda class example...

*First, generate data used in the class example
	clear
	set obs 2000
	gen count=0
	replace count=1 if count[_n-1]==.
	replace count=count[_n-1]+1 if count==0 & count[_n-1]~=0
	gen candidate=1 if count<=1200
	replace candidate=2 if count>1200
	label var cand "1=clin; 2=giul
	gen gender=0
	replace gender=1 if count<=900
	replace gender=2 if count>900 & count<=1200
	replace gender=1 if count>1200 & count<=1300
	replace gender=2 if count>1300
	label var gen "1=men; 2=women

*Second, calculate lambda
*E1 = overall dist. - highest freq of DV
	tab cand
	di 2000-1200
	*so E1 = 800
*E2 = overall dist. - (sum of highest freq of each IV or column)
	tab cand gender
	di 2000 - (900 + 700)
	*so E2 = 400
*Lambda = (E1-E2)/E1
	di (800-400)/800
*So lambda = .5, so we have a 50% reduction in error knowing the IV
*you can also do this using the "lambda" command. You can download this from ///
http://www.inomics.net/cgi/repec?handle=RePEc:boc:bocode:S435701
*use the lambda_a value for what we've done so far
	lambda cand gen

****Part 3: Cramer's V example
	clear
	use "http://www.uky.edu/~clthyn2/intro_stats/IR_example_new_vars.dta"
	tab colony region, V
	*compare to lambda
	lambda colony region

****Part 4: Lambda and Cramer's V with ordinal IVs
	lambda colony polity_ord
	tab colony polity_ord, V

*****Part 5: ttest, oneway anova, 2+ way anova
**ttest
*we can only have 2 groups, so first we recode region to Europe=1, all others=2
	gen region2=region1
	replace region2=2 if region1~=1

*now we can run a t test or a very simple anova; we should get substantively identical results...
	ttest lgdppcl, by(region2)
	oneway lgdppcl region2

*unlike the t-test, we can extend oneway to deal with a 2+ category IV...
	oneway lgdppcl region1
*add "scheffe" to the end to see which regions are driving the differences...
	oneway lgdppcl region1, scheffe

*using 2+ way anova, we can add in more categorical IVs...
	anova lgdppcl region1
	anova lgdppcl region1 colony

****Part 6: gamma test for ordinal variables

*this replicates the example we did in class; we add chi2 to make sure it's significant
	tab polity_ord wealth_ord, gamma chi2

****Part 7: tau-b test for ordinal variables

*this replicates the example for tau-b that we did in class
	ktau polity_ord wealth_ord, stats(taub)