/* for Stata */ /* given the size of the raw data file, better to keep data separate from the program.. so use INFILE rather than LINES so download the chdage.dat file , save it somewhere on the hard disk (remember the path!), then have the INFILE statement point to it... ie give the full path e.g. if you store the .dat file in sub-directory or folder called c:\681folder\ , path would be "c:\681folder\chdage.dat" */ * switch to working directory, where data stored * change thsi to your path [and use backslash \ rather than : ] * e.g. cd "c:\681stuff\alr_1\" cd ":Macintosh HD:User:dad:courses:681:alr_1:" clear infile id age chd using chdage.dat * create age categories gen age_mid = age recode age_mid 20/29=25 30/34= 32 35/39=37 40/44=42 45/49=47 50/54=52 55/59=57 60/69=65 save chdage, replace * ------------------------------------------------------- * plot the Raw data graph chd * age * ------------------------------------------------------- * Table 1.2 CHD vs Categorised ages; tabulate age_mid chd, row * ------------------------------------------------------- * make means (proportions) of chd by age_mid * save into new file called prevalences (say) collapse (mean) chd , by(age_mid) save prevalences, replace * ------------------------------------------------------- * plot the prevalences with age categorized graph chd age_mid * ------------------------------------------------------- * bring back the full data clear * fit Logistic regression by Generalized linear model * supply a binomial denominator of 1 for each person use chdage glm chd age , family(binomial 1) link(logit) * add fitted prevalence predict fitted_p * ------------------------------------------------------- * fit Logistic regression by special program for logistic * and create fitted value * it doesnt give betas, only odds ratios * so type logit after estimation to get coefficients logistic chd age logit predict fitted * ------------------------------------------------------- * plot the prevalences fitted by logistic graph fitted age * save data and fitted values save chdage, replace * ------------------------------------------------------- * combine fitted points for smooth curve with observed prevalences clear use chdage sort age_mid save chdage, replace clear use prevalences sort age_mid merge age_mid using chdage tabulate _merge * ---------------------------- * Overlay the observed and fitted prevalences; graph chd fitted age if age == age_mid