Some Notes on Using SAS via SAS EDITOR
(j.h. and a.n. 97.06.07)
See also notes prepared by Marielle Olivier, available on shelf
in computer laboratory
SAS is organized around 3 windows
- PROGRAM EDITOR
- LOG
- OUTPUT
Typical sequence is to
- prepare program commands in the PROGRAM window, save them,
then submit them for batch processing
- examine the 'log' or 'report' displayed in
the LOG window. Errors are highlighted. The most common
are
- semicolons (;) , used to signify the end of a
statement, are missing
- the names of variables, procedures or options are
mis-spelled
- examine output displayed in OUTPUT window (if program
was successful)
- (if not) fix up any errors and submit the program again.
Before resubmitting, you will wish to 'clear' the LOG window;
you may also wish to clear the OUTPUT window, so that output
from previous submissions in the same session do not accumulate
and confuse.
A SAS Program
A SAS 'program' (at least for a beginning user, working on a
small dataset) is likely to consist of the following:
- DATA step
- RUN (to close the DATA step... not essential but helpful)
- PROC (short for PROCEDURE)
- RUN (again optional... but cannot hurt)
- (Maybe another) PROC
- etc
- RUN (a statement to process the above requests;
one RUN statement is essential)
You can save the 'program' for future use/modification.
Do so from the file menu when in the PROGRAM window. Many users
use the suffix '.sas', to designate a file containing sas
statements and requests for procedures to be run.
The DATA step -- overview
There are a number of ways to set up the data for use in one
or more PROCS.
- have the 'raw' data stored as text in an separate ascii file,
and have SAS read it in and 'parse' it in and make into
'observations' using the INPUT statement; RECOMMENDED!!
- have the 'raw' data listed as text in the program, and
have SAS read it in and 'parse' it in and make into
'observations' using the INPUT statement;
When you run the DATA step, SAS sets up (internally, not
visible to you) a binary file containing the names of the
variables, together with the data. This datafile (we call it
a SAS dataset) is then accessible to for the remainder of the
session. By default, it is a 'scratch' file that will disappear
at the end of the session when you exit SAS; if it is a big
file that takes a lot of time to create, and that you will be
going back to lots of time for further analyses, you might want
to make it 'permanent'. There is no need to do so in this course.
Provided you keep the '*.sas' file (and the raw data file -- if
you keep the data in a separate ascii file), you can always
recreate the SAs dataset just by going back to your file containing
the DATA step at another time and rerunning it.
-
have the data already saved in a permanent SAS dataset. This way,
you bypass the INPUT statement (since the variable names are already
stored with the permanent dataset) and use the SET statement to
read the observations in from the permanent dataset. We will not need
this way of doing things for this course, but you would probably
want to do so for your thesis.
The DATA step -- in more detail
- The step begins with the reserved word DATA
This tells SAS that you wish to set up -- permanently or just for
this session -- a sas dataset.
In SAS parlance, a dataset is a number of 'observations' ... we
might call them 'cases' or 'subjects'. Each 'observation' consists
of the same a fixed number of variables; so you might think of
the dataset as a 'rectangular' file of so many cases
('observations' or 'rows') and so many variables/observation
('columns')
Following the word DATA you supply a name for this dataset you
are asking SAS to create. To save typing, and since I seldom have
to use the same later in the program, I typically call the dataset
'a'; you might want to get fancier e.g.
- DATA alcohol;
or...
- DATA heights;
etc.
Remember to put a semicolon (;) after the name.
- Next (if you have the raw data in a separate file) ...
the INFILE statement.
This is just a pointer to the file in question. To be safe,
you can give the full path name...
e.g. INFILE 'a:\course67\alcohol.dat' ;
remember to enclose the path name in single quotes, and --
because INFILE is a SAS statement -- to end it with a semicolon.
- Next (if, as you are likely to be doing in this course,
you are creating the sas dataset from raw data) ...
the INPUT statement.
This is a directive to SAS on how to 'parse' the file
containing the raw data. In this course this can be as simple
as supplying the names you want to give each variable. Most of the
raw data files we have put in the www site have spaces separating
the variables (so called "free format") and so there are no special
instructions on starting and ending columns. So for example, you
might say
INPUT age gender height weight ; (remember the ';' !!!)
If you wanted to tell SAS that age was always in columns 1-2,
gender in coulumn 4 etc, you could put
INPUT age 1-2 gender 4 height 6-9 weight 11-14;
By default, SAS will assume that variables are numeric; if you
have a variable containing alpha-numeric data (e.g. if in the
raw datafile you had m for male and f for female, you would
tell SAS that by saying
INPUT age 1-2 gender $ 4 height 6-9 weight 11-14;
If gender is alphanumeric, it can be used in tabulations etc but
not in any arithmetic. It can be used as a 'class' variable in
regression/anova procedures. One can always make a new variable
e.g. using the statements
gender_n = . ;
if gender = 'm' then gender_n = 0; * NOTE: 'm' and 'M' not same;
if gender = 'f' then gender_n = 1;
Most users prefer to represent gender numerically from the start.
It also saves on data entry if the enterer can use the numeric
keypad rather than hunting for the m and f keys.
If you want to have the labels m and f (or male/female or
whatever...) rather than the 0/1 appear on printouts, you can
do so using the FORMAT statement in the DATA step.
- Next (if you want to create derived variables or
exclude certain observations from the dataset being
created) ...
programming statements such as...
bmi = weight / (weight*weight); /*creates a new variable
and adds it to dataset */
if gender = 1; /*includes only those with
gender = 1 */
a_g_term = age*gender; /*create interaction term */
Notes:
You can put comments in your program in two ways:-
- by starting the statement with an asterisk
and ending it (as usual) with a semicolon...
e.g.
* include females only ;
* the next steps are to set up data for table 1;
- by surrounding the comment(s) by
/* at the beginning
and
*/ at the end .......see bmi example above
SAS ignores everything in between.
This trick is helpful when you want to run just a
part of a program but don't want to delete any
of your hard-thought-out steps or PROCs
that you might want some other time.
Statements can run on from one line to next and you can
use blank lines for readability. Indents also help show
structure of program.
I find it helpful to put names of variables in lower case
and use upper case for reserved SAS words.
Max of 8 letters for name of a variable; can use underscore
e.g. age_dx age_tx for readability; name must start with
letter.
EXAMPLE of DATA step followed by SORT and several procedures;
DATA alberta;
INFILE 'alberta.dat';
INPUT id_no age gender height weight;
bmi = weight / height**2; * **2 is same as 'to power of 2';
if age >= 11 and age <= 15; * careful with 'ands' and or's' ;
PROC SORT; BY gender; * sorts the dataset 'alberta' by gender;
* otherwise leaves dasaset contents as is;
RUN;
PROC MEANS;
var height weight;
BY gender; * repeats procedure for each gender;
* must have used SORT beforehand;
PROC PLOT FORMCHAR='-----------'; /* formchar supplies character */
/* for borders of plot */
PLOT weight*height = gender;
* uses values of gender as symbol;
RUN;
PROC PLOT;
PLOT Y1*X = '1' Y2*X='2' / OVERLAY;
* puts both plots on same graph ;
* using the symbols 1 and 2 respectively;
PROC GLM;
MODEL weight = height ;
BY gender;
RUN;
PROC REG;
MODEL weight = height; * like GLM but uses continuous x's only ;
BY gender; * does not allow 'class' variables ;
* does not produce Type I and III SS ;
DATA males; * creating a new dataset;
SET alberta; * reads observations from existing dataset ;
* created earlier in session, or stored as ;
* a permanent dataset, ... ;
IF gender = 0; * allows only those with gender = 0 to be ;
* taken into new dataset ;
RUN;
DATA females; * creating yet another... ;
* alberta and males still exist and are ;
* available to all PROCS ;
SET alberta;
IF gender = 1;
RUN;
A PROGRAM TO ILLUSTRATE SOME SELECTED PROCEDURES AND FEATURES
OF SAS : MEAN, PLOT, GLM, REG, OUTPUT, MERGE, OVERLAY, BOX
OPTIONS LS=65 PS=65;
DATA a;
INFILE 'a:kkm5_8.dat';
INPUT salary gpa;
ID = _N_;
RUN;
DATA f2; SET a;
PROC MEANS;
PROC PLOT; PLOT salary * gpa;
PROC GLM;
MODEL salary = gpa;
PROC REG;
MODEL salary = gpa/CLM CLI; /* CI for mean, individuals */
OUTPUT OUT = temp
PREDICTED = p
L95M=lm U95M=um L95=li U95=ui;
RUN;
DATA f3; SET temp;
ID = _N_;
RUN;
DATA f4; MERGE f2 f3; BY id;
RUN;
DATA f5; SET f4;
PROC PLOT;
PLOT salary*gpa='s' p*gpa='p'
lm*gpa='*' um*gpa='*' li*gpa='+' ui*gpa='+' / OVERLAY BOX;
RUN;
General comments
If you minimize the PROGRAM Window before you run the program,
you will be able to see the LOG window and tell by the colours
of the messages whetehr your program has been successful!!
OUTPUT and LOG windows
You can save the contents of these windows:- use
the 'save' or 'save as' command in the file menu.
You can customize the width (no of characters accross)
and height (number of lines down) of the OUTPUT pages...
using the OPTIONS statement at the beginning of the program...
PAGESIZE (or PS for short) # of lines on page
LINESIZE (or LS for short) # of character spaces accross the page
e.g.
OPTIONS LINESIZE = 75 PAGESIZE = 60; /* 60 lines of 75 characters */
If you save the OUTPUT or LOG file and then open it in a
wordprocessor, better to use a MONO-spaced font such as
COURIER ... otherwise tables and plots will not line up.