*Thyne, Clayton
*Intro to Stats, Chapter 4 examples

	clear
	set mem 80m
	cd [your directory]

*  NOTE: the above command asks you to change the directory.  The directory will depend on how you've set up
*  the folders on your computer.  For example, my command is...
*  cd F:\teaching\Fall_2008\PS572-401-baby_stats\do_file_directories\chapter4
*  This is necessary so you know where stata is saving things.

****Part 1:  finding mean and variance
*generate probability distribution of a single roll of a die
	set obs 20
	gen roll20=1 + int(6*uniform())
	hist roll20,  gap(10) addlabels discrete xlabel(1(1)6) yline(.16666) title(probability distribution after 20 rolls)
	tab roll20

****Part 2: showing the law of large numbers
	clear
	set obs 10
	gen roll=1 + int(6*uniform())
	sum roll

	clear
	set obs 100
	gen roll=1 + int(6*uniform())
	sum roll

	clear
	set obs 1000
	gen roll=1 + int(6*uniform())
	sum roll

	clear
	set obs 10000
	gen roll=1 + int(6*uniform())
	sum roll

	clear
	set obs 100000
	gen roll=1 + int(6*uniform())
	sum roll

	clear
	set obs 1000000
	gen roll=1 + int(6*uniform())
	sum roll

	clear
	set obs 8388604
	gen roll=1 + int(6*uniform())
	sum roll

****Part 3: examining probabilities with the Z distribution
	clear
	set mem 80m
	drawnorm variable, n(1000000) means(0) sds(1)
	hist variable, xline(0 1 -1)
	sum

*What is the probability of being greater than 1?
	sum if var>1
	di 158857/1000000
	*compare the result to Table IV (WW page 672)

*What is the probability of being less than 1?
	di 1-(158857/1000000)

*What is the probability of being less than 1 and greater than -1?
*You should know this just by knowing the definition of SD...
	sum if var>-1 & var<1
	di 682668/1000000

****Part 4: an example of probabilities with the Z distribution
	clear
*From the example on page 130, what is the probability of ///
being at least 74 inches given a mean height of 69 inches ///
and SD of 3 inches?
	drawnorm height, n(1000000) means(69) sds(3)
	hist height, xline(74)
*we want to know the area to the right of the line in the previous histogram
	sum
	sum if height>=74
	di 47721/1000000
*note, the numerator above will change depending on the sample stata draws
*check this versus the answer on page 130

*Now, see how standardizing the previous example works...
	clear
	set mem 200m

	drawnorm grad, n(10000) means(72) sds(12)
	cumul grad, gen(grad_cumul)
	scatter grad_cumul grad, sort xline(48) saving(grad_CDF.gph, replace) title(grad_CDF) 
	hist grad, norm xline(48) saving(grad_PDF.gph, replace) title(grad_PDF) 

	*now, standardize the variable
	di (48-72)/12

	clear
	drawnorm normal, n(10000) means(0) sds(1)
	cumul normal, gen(norm_cumul)
	scatter norm_cumul normal, sort xline(-2) saving(grad_CDF_stand.gph, replace) title(grad standardized CDF) 
	hist normal, norm xline(-2) saving(grad_PDF_stand.gph, replace) title(grad standardized PDF) 

	graph combine grad_CDF.gph grad_PDF.gph grad_CDF_stand.gph grad_PDF_stand.gph

****Part 5: Showing central limit theorem
	clear
	set mem 80m
	drawnorm variable, n(10) means(0) sds(1)
	hist variable, xline(0 1 -1) title(n=10) norm saving(n_10.gph, replace)

	clear
	drawnorm variable, n(100) means(0) sds(1)
	hist variable, xline(0 1 -1) norm title(n=100) saving(n_100.gph, replace)

	clear
	drawnorm variable, n(1000) means(0) sds(1)
	hist variable, xline(0 1 -1) norm title(n=1000) saving(n_1000.gph, replace)

	clear
	drawnorm variable, n(100000) means(0) sds(1)
	hist variable, xline(0 1 -1) norm title(n=100_000) saving(n_100000.gph, replace)

	graph combine n_10.gph n_100.gph n_1000.gph n_100000.gph