Clinician's
corner

Back to main page

Programmer's
corner
So, you use WinBUGS a lot? Want more?
Patrick Blisle
Division of Clinical Epidemiology
McGill University Health Center
Montreal, Quebec CANADA
patrick.belisle@rimuhc.ca

Last modification: 24 sep 2015















Version 1.0 (January 2010)
SAS macro %NewDatasetName
A tip on determining a safe temporary data set name
Inside a SAS macro, one often needs to create temporary data sets; giving them safe names should be job # 1. Indeed, when writing temporary data sets, one must make sure that none of the new data sets overwrites an already existing data set – e.g. a data set that the user (if you intend to make your macro available to others) or yourself have created before calling your macro. Many strategies for doing so are conceivable, with different levels of safety; %NewDatasetName is a robust and safe solution, as it compares a proposed data set name to all existing SAS data sets and modifies it if necessary.

[ %NewDatasetName is a SAS macro that returns a temporary SAS data set name that does not currently exist – that is, the name of a data set you can safely use to store data (or intermediate results) without risking inadvertantly overwriting an already existing data set. ]


Menu



Top
Syntax

%NewDatasetName(proposalname)


Top
Example

Although %NewDatasetName can also be used outside of SAS macros, we will illustrate its use through an example where it is called from within a macro.

As discussed in the introduction, many strategies are conceivable when naming temporary data sets. The first strategy that comes to mind is probably to use a reserved class of data sets names for your macros, e.g., starting temporay data sets names with underscores within macros.
/*******************
 * (Not suggested) *
 *                 *
 *******************/;


%macro MyMacro(dataset, xvar, arg3, arg4, etc);

  proc contents data=&dataset out=_contents noprint; 
  run;

  ---snipped out sas code---

  proc means data=&dataset noprint;
    var &xvar;
    output out=_means mean=mean;
  run;

  ---snipped out sas code---

  * Tidy-up;
  proc datasets nolist;
    delete _contents _means;
  run;
%mend;

This is a relatively safe approach if you intend to use your macro for personal applications only. However, if you plan to distribute your macro, it is note safe at all! Indeed, you cannot be sure that someone using your %MyMacro macro did not already name a data set either _contents or _means, as it is a relatively common practice to start temporary data set names with underscores. Therefore, I conclude that this is not a particularly safe method.

One could then think of using two underscores to start temporary data set names in macros: that certainly reduces the risk of overwriting data sets, but is not 100%-safe either, especially if you consider the possibility that your macro could be called from within another macro: the possibility of conflicts is then considerably raised!

A safer strategy would be to start any temporary data set name within a macro with the name of the macro itself, as in:
/*******************
 * (Not suggested) *
 *                 *
 *******************/;


%macro MyMacro(dataset, xvar, arg3, arg4, etc);

  proc contents data=&dataset out=MyMacro_contents noprint; 
  run;

  ---snipped out sas code---

  proc means data=&dataset noprint;
    var &xvar;
    output out=MyMacro_means mean=mean;
  run;

  ---snipped out sas code---

  * Tidy-up;
  proc datasets nolist;
    delete MyMacro_contents MyMacro_means;
  run;
%mend;

It is indeed much better than the two strategies presented earlier. The chance that a user already has a data set named MyMacro_contents or MyMacro_means is extremely low! However, it is a good practice to use macro names that are illustrative of what they are doing, which often leads to using long macro names (unlike MyMacro, which is short but not very illustrative, I must say!). This strategy then involves (possibly repetitively) retyping the (possibly long) macro name, followed by some meaningful extension, the whole process leading to an increased risk of typing errors. Of course, one could always copy-and-paste the macro name whenever one needs to give a name to a temporary data set, but this is cumbersome (at least to a lazy programmer as I am!).

The best solution would be to have an automatically created data set name, which is exactly what %NewDatasetName does. Indeed, it takes a proposal data set name and accepts it if it is not already used. If a file with the suggested name already exists, it adds an underscore in front of it, and adds a numeric suffix to it (if necessary) until it finds a name not currently being used. Saving the suggested name into a local macro variable (dscontents and dsmeans, in the example below) makes it easy to use as you never really need to know the actual name attributed to your temporary data set: you will simply refer to it by using the appropriate macro variable.


/******************
 * Our suggestion *
 *                *
 ******************/;

%include 'c:/your directory/NewDatasetName.sas';


%macro MyMacro(dataset, xvar, arg3, arg4, etc);

  %local dscontents dsmeans;

  %let dscontents = %NewDatasetName(contents);

  proc contents data=&dataset out=&dscontents noprint; 
  run;

  ---snipped out sas code---

  %let dsmeans = %NewDatasetName(means);

  proc means data=&dataset noprint;
    var &xvar;
    output out=&dsmeans mean=mean;
  run;

  ---snipped out sas code---

  * Tidy-up;
  proc datasets nolist;
    delete &dscontents &dsmeans;
  run;
%mend;




Top
Note

It is important that %NewDatasetName is called just before the new data set name thus obtained is actually used. Indeed, in the first example below, the two local macro variables dstmp1 and dstmp2 would take the same value (tmp, if the data set tmp did not exist) while in the second example dstmp2 would have a different value, by being given its own numeric suffix.

/*****************************
 * That wouldn't work,       *
 * as dstmp1 and dstmp2      *
 * would take the same value *
 *                           *
 *****************************/;


%macro MyMacro(dataset, xvar, arg3, arg4, etc);

  %local dstmp1 dstmp2;

  %let dstmp1 = %NewDatasetName(tmp);
  %let dstmp2 = %NewDatasetName(tmp);

  proc contents data=&dataset out=&dstmp1 noprint; 
  run;

  ---snipped out sas code---

  proc means data=&dataset noprint;
    var &xvar;
    output out=&dstmp2 mean=mean;
  run;

  ---snipped out sas code---

  * Tidy-up;
  proc datasets nolist;
    delete &dstmp1 &dstmp2;
  run;
%mend;

/***************************
 * That would work better! *
 *                         *
 ***************************/;


%macro MyMacro(dataset, xvar, arg3, arg4, etc);

  %local dstmp1 dstmp2;

  %let dstmp1 = %NewDatasetName(tmp);

  proc contents data=&dataset out=&dstmp1 noprint; 
  run;

  ---snipped out sas code---

  %let dstmp2 = %NewDatasetName(tmp);

  proc means data=&dataset noprint;
    var &xvar;
    output out=&dstmp2 mean=mean;
  run;

  ---snipped out sas code---

  * Tidy-up;
  proc datasets nolist;
    delete &dstmp1 &dstmp2;
  run;
%mend;


Limitations

Even though %NewDatasetName is a safer way to determine a temporary SAS data set name compared to the others presented above, we must underline a potential caveat, easily avoidable if the two simple rules below are followed.

Define names just in time
As discussed above, call %NewDatasetName just before you actually need a new data set name. Defining temporary data sets names too long in advance could lead to overwriting data sets that were created between the call to %NewDatasetName and the intended use of the name obtained.

Tidy-up at the end of each macro
It is good pratice to remove temporary data sets at the end of your macros: efficient memory space management is obviously a good idea anyway.

A final remark: since data set names lengths are limited to 32 characters, you should limit the proposed names to %NewDatasetName to values that are less than 27 characters-long, say. Indeed, if you repetitively call a macro or if a few macros call %NewDatasetName with the same proposal data set name, the addition of a leading underscore and a numeric suffix (as in _MyProposedDatasetName344, say) adds some length to the obtained data set name, as compared to the proposed name. Keeping the proposed names somewhat shorter than the 32 characters-long limit will ensure a successful %NewDatasetName return name.

Top
SAS Code
%macro NewDatasetName(proposalname);
    %*Finds the first unused dataset named *datasetname*, adding a leading underscore and a numeric suffix as large as necessary to make it unique!;
    %local i newdatasetname;
    %let proposalname=%sysfunc(compress(&proposalname));
    %let newdatasetname=_&proposalname;

    %do %while(%sysfunc(exist(&newdatasetname)));
        %let i = %eval(&i+1);
        %let newdatasetname=_&proposalname&i;
    %end;

    &newdatasetname
%mend;


Top
Download


Download %NewDatasetName 1.0 now.