*Thyne, Clayton *Intro to Stats, Chapter 4 examples clear set mem 80m cd [your directory] * NOTE: the above command asks you to change the directory. The directory will depend on how you've set up * the folders on your computer. For example, my command is... * cd F:\teaching\Fall_2008\PS572-401-baby_stats\do_file_directories\chapter4 * This is necessary so you know where stata is saving things. ****Part 1: finding mean and variance *generate probability distribution of a single roll of a die set obs 20 gen roll20=1 + int(6*uniform()) hist roll20, gap(10) addlabels discrete xlabel(1(1)6) yline(.16666) title(probability distribution after 20 rolls) tab roll20 ****Part 2: showing the law of large numbers clear set obs 10 gen roll=1 + int(6*uniform()) sum roll clear set obs 100 gen roll=1 + int(6*uniform()) sum roll clear set obs 1000 gen roll=1 + int(6*uniform()) sum roll clear set obs 10000 gen roll=1 + int(6*uniform()) sum roll clear set obs 100000 gen roll=1 + int(6*uniform()) sum roll clear set obs 1000000 gen roll=1 + int(6*uniform()) sum roll clear set obs 8388604 gen roll=1 + int(6*uniform()) sum roll ****Part 3: examining probabilities with the Z distribution clear set mem 80m drawnorm variable, n(1000000) means(0) sds(1) hist variable, xline(0 1 -1) sum *What is the probability of being greater than 1? sum if var>1 di 158857/1000000 *compare the result to Table IV (WW page 672) *What is the probability of being less than 1? di 1-(158857/1000000) *What is the probability of being less than 1 and greater than -1? *You should know this just by knowing the definition of SD... sum if var>-1 & var<1 di 682668/1000000 ****Part 4: an example of probabilities with the Z distribution clear *From the example on page 130, what is the probability of /// being at least 74 inches given a mean height of 69 inches /// and SD of 3 inches? drawnorm height, n(1000000) means(69) sds(3) hist height, xline(74) *we want to know the area to the right of the line in the previous histogram sum sum if height>=74 di 47721/1000000 *note, the numerator above will change depending on the sample stata draws *check this versus the answer on page 130 *Now, see how standardizing the previous example works... clear set mem 200m drawnorm grad, n(10000) means(72) sds(12) cumul grad, gen(grad_cumul) scatter grad_cumul grad, sort xline(48) saving(grad_CDF.gph, replace) title(grad_CDF) hist grad, norm xline(48) saving(grad_PDF.gph, replace) title(grad_PDF) *now, standardize the variable di (48-72)/12 clear drawnorm normal, n(10000) means(0) sds(1) cumul normal, gen(norm_cumul) scatter norm_cumul normal, sort xline(-2) saving(grad_CDF_stand.gph, replace) title(grad standardized CDF) hist normal, norm xline(-2) saving(grad_PDF_stand.gph, replace) title(grad standardized PDF) graph combine grad_CDF.gph grad_PDF.gph grad_CDF_stand.gph grad_PDF_stand.gph ****Part 5: Showing central limit theorem clear set mem 80m drawnorm variable, n(10) means(0) sds(1) hist variable, xline(0 1 -1) title(n=10) norm saving(n_10.gph, replace) clear drawnorm variable, n(100) means(0) sds(1) hist variable, xline(0 1 -1) norm title(n=100) saving(n_100.gph, replace) clear drawnorm variable, n(1000) means(0) sds(1) hist variable, xline(0 1 -1) norm title(n=1000) saving(n_1000.gph, replace) clear drawnorm variable, n(100000) means(0) sds(1) hist variable, xline(0 1 -1) norm title(n=100_000) saving(n_100000.gph, replace) graph combine n_10.gph n_100.gph n_1000.gph n_100000.gph