621 'learn SAS project' january 2003
|
I am unclear about what it is I am supposed to graph for the SAS project. Is it the correlation between height and BMI (on the y-axis), versus the power of the height (on the x-axis)? | YES |
If so, how do we go about programming it into SAS? There are fancier ways that you might want to use later on so as to avoid a lot of duplication and dangerous manual extraction.. much better to automate everything that to go cutting and pasting .. but for now the way indicated on the right will get the job done without getting into too much SAS gymnastics .. However, you might want to use of the following little bit of organization even now.. to avoid having 6 SEPARATE PROGRAMS (Imagine if at the end you discover have made the same logic mistake, and you now have to go back and change it in 6 places.. !) |
Somewhat tedious but.. how about extracting the 9 (correlation,power) pairs and making them into 9 rows (observations) and 2 columns (variables) ..do same for next dataset etc until you have a file
with 54 rows and 2 columns ie 'stack' the pairs under each other
|
DATA dset1; KEEP dset_id ht wt; INFILE ...; INPUT ... ; dset_id=1; ht = whatever.. ; wt = whatever ; RUN; DATA dset2; KEEP dset_id ht wt; INFILE ...; INPUT ... ; dset_id=2; ht = ; wt = ; RUN; etc.. Then stack them all together into one dataset... DATA all; SET dset1 dset2 dset3 dset4 dset5 dset5 dset6; compute all the bmi's; RUN; PROC SORT DATA = all; BY dset_id; RUN; PROC CORR DATA = all; BY dset_id; var ht; WITH bmi100 bmi125 ... ; RUN; |
Think of 'SET' as 'READ from' When you list existing sas datasets in a SET statement, SAS will read in the records from the 1st one until it gets to the end, then continue from the top to the end of the next one, and so on until it reaches the end of the last one. As with creating any dataset in SAS, one can act on the variables being brought in e.g. to create new ones from them, or to use one or more of them to decide whether to include or exclude the newly created record. |
Up to now, I performed a correlation between the different
BMI's with height (in metres). (i.e. var bmi1 bmi2 ... ht_m). Is this correct? What's the next step? |
YES See above |
I have seen another group's graph, and they have achieved
a negative slope (straight line). However in class, you mentionned that it was a
quadratic relationship. Therefore, are all datasets supposed to show a parabola,
instead of a straight line? BY THE WAY... don't believe every correlation you see.. it might be worth first doing a quick plot of height vs weight (or PROC UNIVARIATE) to see if the source data points make sense... If you find wild values, and you cannot fix them, you can exclude them from correlation (or any other!) calculations using the WHERE statement inside the PROC e.g., PROC CORR data = ... ; WHERE (values are reasonable** ); Var height with bmi100 bmi125 .. etc ; RUN; ** for example... using HEIGHT > xx.x and HEIGHT < x.xx |
You are correct.. they will probably go from positive
to zero to negative... When I drew the plot on the board in class, I was thinking of positive and negative correlations in the same way, ie I was thinking of their absolute values. If we think about the correlation and its sign, we should be looking at where it is close to zero (positive or negative) I havent given this exercise before, so I dont know the answer .. that is why I asked the question .. how robust and universal is this power of 2 .. If interested in this issue, you could search the web for Quetelet, the 19th century scientist for whom the BMI index is named The efforts to come up with a simple prediction equation for body surface area (BSA) are also of interest. http://www.halls.md/body-surface-area/bsa.htm |