/* --- for Stata --- */ /* given the size of the raw data file, better to keep data separate from the program.. so use infile rather than input so download the pros.dat file , save it somewhere on the hard disk (remember the path!) e.g. if you store the .dat file in sub-directory or folder called c:\681folder\ , path would be "c:\681folder\pros.dat" */ /* RANT by JH re "dummy" variables (not particularly here , but in future) JH's suggestion for yes no data .. or other binary (dichotomous) vars. USE an INDICATOR variable DIRECTLY.. where 1 = (<>) the PRESENCE of the condition/state/trait 0 = (<>) the ABSENCE of the condition/state/trait e.g. INSTEAD OF naming a variable 'SEX' and then trying to decide (or later, remember) which code you used for male and which female WHY NOT call the variable I_male 0 = no , ie female 1 = yes , ie male ? This admittedly asymmetric way to chose which state/trait does however mean that you wont have to ever look it up later.. By the way, in better families we do not refer to these <> variables as "DUMMY" variables [I did have a major mis-communication with someone else about my use of 'indicator' .. this person was using 'indicator' the way people do when they speak of (say) economic 'indicators' ] */ * switch to working directory, where pros.dat file is stored * change this to your path [and use backslash \ rather than : ] * e.g. cd "c:\681stuff\alr_1\" cd ":Macintosh HD:User:dad:courses:681:alr_1:" clear infile id capsule age race dpros dcaps psa vol gleason using pros.dat * create PSA categories gen psa_mid = psa recode psa_mid 0/2.4=1.2 2.5/4.4= 3.5 ******** USER to recode the rest of the categories ****** save pros, replace * ------------------------------------------------------- * plot the Raw data * ******** rest of the commands below were cut/pasted from chdage example ***** * ******** USER must change them to match the psa variables ***** graph chd * age * ------------------------------------------------------- * Table 1.2 CHD vs Categorised ages; tabulate age_mid chd, row * ------------------------------------------------------- * make means (proportions) of chd by age_mid * save into new file called prevalences (say) collapse (mean) chd , by(age_mid) save prevalences, replace * ------------------------------------------------------- * plot the prevalences with age categorized graph chd age_mid * ------------------------------------------------------- * bring back the full data clear * fit Logistic regression by Generalized linear model * supply a binomial denominator of 1 for each person use chdage glm chd age , family(binomial 1) link(logit) * add fitted prevalence predict fitted_p * ------------------------------------------------------- * fit Logistic regression by special program for logistic * and create fitted value * it doesnt give betas, only odds ratios * so type logit after estimation to get coefficients logistic chd age logit predict fitted * ------------------------------------------------------- * plot the prevalences fitted by logistic graph fitted age * save data and fitted values save chdage, replace * ------------------------------------------------------- * combine fitted points for smooth curve with observed prevalences clear use chdage sort age_mid save chdage, replace clear use prevalences sort age_mid merge age_mid using chdage tabulate _merge * ---------------------------- * Overlay the observed and fitted prevalences; graph chd fitted age if age == age_mid