Invitation to Perl: A statistician reads Perl manuals

Clinician's
corner

Back to main page

Programmer's
corner

So, you use WinBUGS a lot?

Want more?

Cool Bayesian stuff

Patrick Blisle
Division of Clinical Epidemiology
McGill University Health Center
Montreal, Quebec CANADA
patrick.belisle@rimuhc.ca

Last modification: 24 sep 2015

Invitation to Perl

A statistician reads Perl manuals

This page is an invitation to the statistician to consider the use of Perl. Aside from statistical packages, Perl is the programming language that I use most.

This webpage is not written as an introduction to Perl with lots of explanations, as you can find lots of websites that would do much better than I in that regards; however, I thought that a few examples from real statistician's life might convince some of you of its power and ease of use. I have collected four examples; each of them was written in 5 to 15 minutes, takes a few seconds (or less!) to run and can either save lots of time or lots of pain (or both!): I have sorted them in order of increasing difficulty, but none of them is really difficult.

To download Active Perl, visit Active Perl ; the software installs easily in a few minutes on Windows.

Note that Perl might already be installed if you are working on Linux/Unix operating systems (typically found in /usr/bin/perl).

Introduction

Before looking at any example, please note that Perl is case sensitive; thus, $myvariable is different from $Myvariable or $MYVARIABLE. These variables are also different from @myvariable, @Myvariable or @MYVARIABLE, which are called tables and can be seen as vectors (as in S-Plus or R, for example). Another type of variable is the hash table; it is a little bit more subtle and will not be used in the examples below, but it is more a matter of conciseness (again, this is only an invitation ) than a matter of simplicity. Also note that Perl programmers will typically use uppercase variable names for constants and lower case for variables (that will have changing values through the program). You can do as you wish, but it helps for readability and will be easier to understand for others if you plan to re-distribute your code or if you need help from experienced programmers when troubleshooting.

To submit a Perl program:

On Windows, save your file with the .pl extension: double clicking on it (in Windows Explorer) will launch it; typing mycode.pl or c:\perl\bin\perl.exe mycode.pl from the DOS prompt will also work and gives more flexibility (e.g. for feeding input files and/or redirecting output, as in c:\perl\bin\perl.exe mycode.pl < myinput.txt > out.txt).
On Linux/Unix, make sure that the 1st line of your code gives the path to Perl (typically, /usr/bin/perl), preceded by the shebang character (#!/usr/bin/perl). It can be left in your program even if you redistribute it to Windows users as that line will be interpreted as a comment line on Windows.

Note that Python programming language might also be of interest and is similar to Perl: lots of discussion and comparisons can be found on the web! But as I don't know Python, the only thing I can (try to) sell is Perl!

The examples in this document (remember that there is always more than one way to do it!) are:

Writing a bunch of files (e.g., for submitting a program with different parameter values)
Reading text files (and summarizing results)
Making changes in text files
Doing more complex operations (e.g. writing SAS code for a list of variables)

Example 1

Suppose you want to compute the values n.acc(l, alpha), n.alc(l, alpha) and n.mwoc(l, alpha) for each combination of l ∈ {0.01, 0.02, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.40, 0.50} and alpha ∈ {0.50, 0.90, 0.95, 0.99}; that makes 120 lines of code to write, a task that is somewhat time-consuming, boring and error-prone. But that was before you thought about Perl! Indeed, writing a Perl program for this very simple task takes 2 minutes (no kidding!).

I have chosen, in this example, to save each line in a separate file, using the values of criterion used (acc, alc or mwoc), l and alpha in the file names written to make results easy to trace back, but one could have decided to write the 120 lines in only one file (of 120 lines).

See code

Example 2

Suppose you have submitted the 120 files created in example above, and obtained 120 output files; suppose further that a typical output is such as res/acc-01-90.out . Suppose now that you want to write a summary table of the results, where the only thing of interest, at this point, is the value found on line "Optimal sample size:". Once again, doing so without Perl code is quite a long task, but easy and fun to do in Perl.
I have saved a few of the output files I have obtained (from code created in example 1) in sub-directory res/, so you can try to run the Perl program provided; do not be surprised to see lots of "not_done"'s in the output, as the corresponding output files cannot be found in res/.
Output to this program is sent to standard output (STDOUT), which is the computer screen. To save the results, type c:\perl\bin\perl.exe read-res.pl > mytable.txt .

See code
See output

The output table could be easily imported in an Excel table; Perl code could also be easily adapted to write the results in an html- or latex-formatted table.

Example 3

Suppose you have a bunch of files in which you want to make a correction; in this example, Suppose you have a bunch of files in sub-directory corrections , in which some files have the line

Test 2 sensitivity beta parameters: 115.63 66.15

that was misstyped, as 115.63 should read 155.63.
The substitution can be done for all files with .txt extension with this code.

See a typical file to be treated

Want to try this code? Download the zipped corrections directory, save the Perl code where you unzipped corrections under the name corrections.pl, open the MS-DOS command window, change directory to where corrections was unzipped, type c:\perl\bin\perl.exe corrections.pl and view the modified files in corrections/.

Example 4

This example dates from several years ago: today, I must confess that I would not use Perl to do it but would do it in SAS from start to end (with a macro call and proc sql, which I didn't know at the time); I thought it would still be instructive and decided to keep it, as you might find some pieces of it interesting and recycle the ideas in other applications.

Suppose you have a SAS data file with thousands of variables and that you are ask to do some operations (might be as simple as a frequency table or require the use of a macro) for all variable of years 1997 and 1998, that is, in this particular data set, variables with names ending with 97 or 98. Of course, that could be done by hand, but it is error-prone if you have a long list of variables. Again, Perl can be of help!
Suppose you have a list of variables presented in a Perl-programmer-friendly format (by which I mean a list of variables where long variable names were not broken down on two lines; for your curiosity, for this I used the following %contents macro .
This Perl program will write SAS code to run the macro %mymacro for each variable ending with either 97 or 98.

See output to c:\perl\bin\perl.exe mymacro.pl contents.lst