Fraktal SAS Programming
Inhaltsverzeichnis
Preface
The SAS System (SAS) is an impressive powerful ecosystem of languages, tools and programs leaving the user with all means at hand to work with data and satisfy his curiosity, be it of scientific origin or simply driven by work orders in a top-down ruled organization.
Given the above, it is not surprising that
- SAS license fees appear high, and
- the individual trying to start a user career feels pretty lonesome.
Since no one would buy a modern smartphone to simply make phone calls it is likewise un-appropriate to use SAS solely as a
- SQL database system
- basket of tabulation programs
- graphics toolbox
- web publishing agent
- data-warehouse platform
- statistics package
- metadata manager
- source code generator
Indeed, SAS can perform any of these functions, and more, and even worse, a small team of SAS geeks can deliver any combination of them as scenario-tailored application in an awesome short time frame.
Of course, the result will be a dynamic, self-documenting, metadata driven and generic sort of thing.
That’s why SAS starters feel lonesome and hence, matured users have organized themselves in non-commercial networks worldwide, the largest of which is PhUSE, the Pharmaceutical User Software Exchange.
Are you ready?
Welcome to the club!
Coding
Rules?
While there is no technical reason to introduce and follow coding rules and typographical conventions, it has proven as helpful to do so depending on working context and purpose that is followed.
SAS is freedom is good news for most ad-hoc programmers aiming to have results the same minute.
SAS is freedom is bad news for all team leads and managers bearing responsibility for sustainable usage of resources and maintenance of programs written by individuals that will most likely leave some day.
Throughout the text of this tutorial we will therefore adhere to a set of rules that might seem superfluous at 1st sight but will help to catch structure and process implemented in a program without deep-diving into the code.
Standards!
SAS supports modular coding very well because code processing follows a block or “group” structure as the architects at SAS Institute Inc. would put it. Let’s directly jump into this topic:
data basix; city='Washington'; lat="038° 054′ N"; long="077° 002′ W"; output; city='Berlin'; lat="052° 031′ N"; long="013° 024′ O"; output; city='Tokyo'; lat="035° 041′ N"; long="139° 046′ O"; output; proc sort; by lat; proc print; run;
This appears to be an easy to read and straightforward written program, and this is definitely true. And indeed, this code will complete without error messages and produce a formatted list of three cities along with their explicit latitude and longitude.
But this is not the program that is processed by SAS.
What does SAS see?
Groups
The SAS compiler processes the source code submitted in so called steps which in turn are comprised from groups of lines terminated by a semicolon. If users do not code full steps, then SAS completes the code up to a certain amount.
Lines terminated with semicolon are called statements.
Steps comprised from statements like above are called run groups.
Logically, the submitted code from the above example, will be transformed into three run groups that are executed in discrete steps. In each step syntax check and handling of user feedback is handled separately.
data basix; city='Washington'; lat="038° 054′ N"; long="077° 002′ W"; output; city='Berlin'; lat="052° 031′ N"; long="013° 024′ O"; output; city='Tokyo'; lat="035° 041′ N"; long="139° 046′ O"; output; run;
proc sort data=basix out=basix; by lat; run;
proc print data=basix; run;
Segments
As mentioned earlier, SAS coded workflow is processed as sequence of blocks or groups. Since this processing structure is used everywhere in SAS, we will refer to these blocks and groups as segments throughout the remainder of this text.
Due to various languages available inside SAS, particular segments might have their very special appearance. The run group example from above is merely one of them.
Segments from different syntaxes may be hierarchically nested.
Segments may not intersect, with one exception, however.
Macro
Straightforward Coding
Because it is the most prominent type in a professional senior SAS programmer’s life’s production (the Oeuvre), we will describe SAS Macro ("MACRO") coding as 1st segment type.
As we remember from the run group segment type example, code segments are verbatim encapsulated by an initializing statement that is accompanied by a corresponding termination statement. MACRO definitions are defined by using these two specific statements:
%MACRO name; program code %MEND name;
%NAME;
Generalized Approach
It appears necessary to stress here, that any MACRO does not execute the program code contained but passes it to the SAS compiler which will perform a Compile-and-Go step as default. Nevertheless it would be premature to assume that this mechanism requires the code to be SAS code.
Instead, it is possible to GENERATE-and-PASS any code.
SAS provides means and concepts to direct generated source code to appropriate agents, be it external programs or the operating system itself. OS functions may be called explicitly or implicitly or code may be written to a text file that is executed later on.
Out of the numerous options, the following two might appear quite useful.
Utilize OS Functions
1. Access results from OS commands as data source.
filename myfref pipe “dir c:\ /d”;
This statement assigns a file reference with target type pipe. The pipe type dynamically accesses the result of an OS function as data stream that can be used as text input file inside a data step.
2. Perform an operation on OS level.
systask command “mkdir c:\&MYDIR.”;
The SYSTASK statement is a powerful means to initiate and control background tasks. With options WAIT/NOWAIT it provides direct utilization of OS multitasking by initiating parts of complex SAS code as background tasks.
Write Vector Graphics
filename _xml_ "&MYPTH.\&MYGPH..svg"; data _null_; file _xml_; put '<?xml version="1.0" standalone="no"?>'; put '<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">'; put '<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="29.7cm" height="21cm" viewBox="-200 -100 1200 800"> <desc> Example anim01 - demonstrate animation elements </desc> <title> SpotGrid </title> '; put ' <rect id="OuterBorder" x="-4" y="-4" width="904" height="604" fill="rgb(255,255,255)" stroke="rgb(0,0,0)" stroke-width="8"> </rect> '; put '</svg>'; run;
Advanced Coding
While the crude approach to MACRO programming is 1st choice for any ad-hoc implementation it will never result in a piece of software that will survive in a quality controlled environment. Moreover, when used as component in a modularized system it will not produce predictable results and very likely mess things up at run-time.
Why?
The key reason is, that SAS languages – like other programming languages – do use variable properties, but without forcing the programmer to deliver this information by declaring everything forehand in a header section or similar place.
SAS code is executed regardless whether explicit declaration is found or not. When none is found, SAS applies built-in rules to perform automatic declaration on which it then operates. Properties given that way might not conform with programmers’ expectations or the system’s design requirements.
That’s why!
Symbol Tables
Control information tokens, referred to as parameters are called macro variables ("variables") when writing MACROs. Macro variables are stored in tables, which have been given the name symbol tables ("tables").
On starting a SAS process a global symbol table is initiated and populated with control information used by the session and non-MACRO programs.
On invocation of a MACRO, a local symbol table is initiated and kept alive during run-time of the MACRO. Local symbol tables disappear on termination of "their" MACRO.
Symbol tables are two-column character type matrices with one single property scope being either global or local. MACRO variable names are stored in the 1st column, MACRO variable values are stored in the 2nd column.
MACRO symbol tables are stored in memory.
MACRO functions are processed in memory.
Parameter Scope
Since scope is the only property of MACRO variables, declaration is easy: Simply assign each variable used to one of these two groups.
However, there is a set of rules requiring your attention:
- A particular variable name may appear in an unlimited number of tables.
- MACROs may be nested to form unlimited invocation hierarchies.
- A calling MACRO’s local table appears global to the called MACRO.
- Read references to variables are performed 1st against the local table.
- Write references are processed likewise: local 1st, global 2nd.
- Write references not met in the invocation hierarchy generate a local variable.
Obviously variable declaration is a critical issue in a validated environment. If not done in total then the validation status of the whole system is questionable.
Extending Control
Now, with proper declaration, it is safe to run your MACRO as a component in a validated system. However, it is still difficult to follow its results and discover failure risks or un-wanted misbehavior.
You might therefore find it useful to add functionality such as:
- apply logic to check for parameters’ appropriate values
- navigate through the ecosystem by reading and processing metadata
- document workflow by writing comments to the LOG
- inform responsibles about invocation by sending an email
- write a text file that contains the plain code the MACRO generated
We will implement these requirements now step by step and thereby touch relevant parts of the so-called SAS Macro Facility.
Apply Logic
Implementing MACRO logic is quite comparable to other languages, except that SAS requires so-called MACRO Triggers ("TRIGGERS"”) to direct processing to the appropriate subsystem inside the SAS ecosystem. These are:
& – the ampersand: indicates parameter reference
% - the percent sign: indicates syntax elements
TRIGGERS have been found necessary in the early history of SAS since the SAS Macro Facility was intended to perform text processing before code was sent to the SAS compiler. To invoke the text pre-processor every token is checked whether its 1st digit is a TRIGGER.
Of course the segment structure of coding also applies here:
%IF %LENGTH(&PARM_I.) ne 0 %THEN %DO; program code %END; %ELSE %DO; alternate program code %END;
Depending on whether a value is supplied in parameter PARM_I either "program code" or "alternate program code" is passed to the SAS compiler for processing.
What is Metadata?
A widely used definition of metadata relies on two characteristics:
- primary control function vs. data content
- structured repository vs. one-dimensional parameter list
With these two pillars the definition does not correspond 100% to the denotation of meta-data, which is “data upon data”. Instead, metadata are transcribed as parameters used to
- define,
- control and
- integrate
the dynamic workflow of a software system comprised from autonomous modules.
Claiming storage in a structured repository is a powerful condition and allows for unlimited complexity of workflow logic and its control.
This metadata repository is called METABASE.
Finally, data may easily change domains from DATABASE to METABASE and back, being data content in one phase and control information in another phase of system runtime.
Process Metadata
Although the following examples hold for every static METABASE design we will start over with consideration of dynamic metadata arrays that are read from database tables or datasets. Using dynamic metadata arrays is commonly referred to as DATA DRIVEN APPROACH because processing depends on content of data tables.
Examples are based on the sample CLASS dataset and presentation will have three logical steps:
- generate the list itself
- obtain the number of elements
- utilize list elements
To accomplish this task we will use Data Steps along with PROCs SQL and PRINT in order to sketch three approaches:
- a list with selected delimiters ("List")
- a set of numbered parameters ("Numbered")
- a single parameter that is used repeatedly ("Direct")
List (generate)
The SQL procedure supports very nicely the creation of segmented parameter values from distinct values found for a specific variable in a data table. The target parameter name is indicated with a colon (":") and the concatenation of values is initiated with the separated by option and a by value (or string) in the select statement.
/* generate the list itself */ proc sql noprint; select distinct age into :age_list separated by ' , ' from sashelp.class ; quit; %LET age_list = %QUOTE(&AGE_LIST.);
Be aware, that the SQL procedure is terminated with a QUIT not a RUN statement. To comply with other SQL interpreters, RUN statements do not exist in PROC SQL syntax. Instead, the semicolon (";") triggers processing of SQL statements instantly making each its own RUN GROUP.
List (count)
An automatic MACRO variable is used to get the next piece of information required. SQLOBS contains the number of loops necessary to read the lines of the input table, hence the number of elements that were written to the list.
To save the value it is written to a user owned variable with a LET statement.
/* obtain the number of elements */ %LET age_grps = &SQLOBS.;
List (utilize)
With all the values in one segmented variable we now need to address each value by position. This is easily accomplished with the SCAN function inside a loop that moves over positions from "1" to the "number of elements" that was copied from system owned parameter SQLOBS to user owned parameter AGE_GRPS.
/* utilize list elements */ %DO age_indx = 1 %TO &AGE_GRPS.; proc print noobs data = sashelp.class ; where age = %SCAN(&AGE_LIST.,&AGE_INDX.,','); run; %END;
It might appear obvious now why it was necessary to QUOTE the segmented AGE_LIST parameter when it was list-structured with commas:
The MACRO Facility’s pre-processor property causes replacement ("resolution") of the parameter before the function is executed which is likely to deliver more arguments than expected to the SCAN function.
Quoting inside the SAS System is very much like a science on its own and will be revisited whenever appropriate.