Fraktal SAS Programming: Unterschied zwischen den Versionen

Aus phenixxenia.org
Zur Navigation springen Zur Suche springen
K (Der Seiteninhalt wurde durch einen anderen Text ersetzt: „Kategorie:zazy frameless|250px|left|link=Kategorie:zazy|The wild side of SAS Software Engineering …“)
Zeile 1: Zeile 1:
 
[[Kategorie:zazy]]
 
[[Kategorie:zazy]]
[[Datei:Duck_zazy_com.png|frameless|250px|thumb|right|link=Kategorie:zazy|caption|The wild side of SAS Software Engineering]]
+
[[Datei:logo0_ZAZY_com.png|frameless|250px|left|link=Kategorie:zazy|The wild side of SAS Software Engineering]]
  
==Preface==
 
  
The ''SAS System ('''"SAS"''')'' is an impressive powerful ecosystem of languages, tools and programs leaving the user with all means at hand to work with data and satisfy his curiosity, be it of scientific origin or simply driven by work orders in a top-down ruled organization.
+
[[Preface (from Fraktal SAS Programming) Preface]]
  
Given the above, it is not surprising that
+
[[Coding (from Fraktal SAS Programming) Coding]]
#SAS license fees appear high, and
 
#the individual trying to start a user career feels pretty lonesome.
 
  
 
+
[[Macro (from Fraktal SAS Programming) Macro]]
Since no one would buy a modern smartphone to simply make phone calls it is likewise un-appropriate to use SAS solely as a
 
*SQL database system
 
*basket of tabulation programs
 
*graphics toolbox
 
*web publishing agent
 
*data-warehouse platform
 
*statistics package
 
*metadata manager
 
*source code generator
 
 
 
 
 
Indeed, SAS can perform any of these functions, and more, and even worse, a small team of SAS geeks can deliver any combination of them as scenario-tailored application in an awesome short time frame.
 
 
 
Of course, the result will be a dynamic, self-documenting, metadata driven and generic sort of thing.
 
 
 
That’s why SAS starters feel lonesome and hence, matured users have organized themselves in non-commercial networks worldwide, the largest of which is '''PhUSE''', the '''''[http://www.phuse.eu Pharmaceutical User Software Exchange]'''''.
 
 
 
'''Are you ready?'''
 
 
 
'''Welcome to the club!'''
 
 
 
 
 
==Coding==
 
 
 
===Rules?===
 
 
 
While there is no technical reason to introduce and follow coding rules and typographical conventions, it has proven as helpful to do so depending on working context and purpose that is followed.
 
 
 
'''''SAS is freedom''''' is good news for most ad-hoc programmers aiming to have results the same minute.
 
 
 
'''''SAS is freedom''''' is bad news for all team leads and managers bearing responsibility for sustainable usage of resources and maintenance of programs written by individuals that will most likely leave some day.
 
 
 
Throughout the text of this tutorial we will therefore adhere to a set of rules that might seem superfluous at 1st sight but will help to catch structure and process implemented in a program without deep-diving into the code.
 
 
 
 
 
===Standards!===
 
 
 
SAS supports modular coding very well because code processing follows a block or “group” structure as the architects at SAS Institute Inc. would put it. Let’s directly jump into this topic:
 
 
 
data basix;
 
city='Washington'; lat="038° 054′ N"; long="077° 002′ W"; output;
 
city='Berlin'; lat="052° 031′ N"; long="013° 024′ O"; output;
 
city='Tokyo'; lat="035° 041′ N"; long="139° 046′ O"; output;
 
proc sort; by lat;
 
proc print; run;
 
 
 
This appears to be an easy to read and straightforward written program, and this is definitely true. And indeed, this code will complete without error messages and produce a formatted list of three cities along with their explicit latitude and longitude.
 
 
 
'''But this is not the program that is processed by SAS.'''
 
 
 
'''What does SAS see?'''
 
 
 
 
 
===Groups===
 
 
 
The SAS compiler processes the source code submitted in so called '''''steps''''' which in turn are comprised from groups of lines terminated by a semicolon. If users do not code full steps, then SAS completes the code up to a certain amount.
 
 
 
Lines terminated with semicolon are called '''''statements'''''.
 
 
 
Steps comprised from statements like above are called '''''run groups'''''.
 
 
 
Logically, the submitted code from the above example, will be transformed into three run groups that are executed in discrete steps. In each step syntax check and handling of user feedback is handled separately.
 
 
 
data basix;
 
city='Washington'; lat="038° 054′ N"; long="077° 002′ W"; output;
 
city='Berlin'; lat="052° 031′ N"; long="013° 024′ O"; output;
 
city='Tokyo'; lat="035° 041′ N"; long="139° 046′ O"; output;
 
run;
 
 
 
proc sort data=basix out=basix;
 
by lat;
 
run;
 
 
 
proc print data=basix;
 
run;
 
 
 
 
 
===Segments===
 
 
 
As mentioned earlier, SAS coded workflow is processed as sequence of blocks or groups. Since this processing structure is used everywhere in SAS, we will refer to these blocks and groups as '''''segments''''' throughout the remainder of this text.
 
 
 
Due to various languages available inside SAS, particular segments might have their very special appearance. The '''''run group'' example''' from above is merely one of them.
 
 
 
'''Segments from different syntaxes may be hierarchically nested.'''
 
 
 
'''Segments may not intersect, with one exception, however.'''
 
 
 
 
 
==Macro==
 
 
 
===Straightforward Coding===
 
 
 
Because it is the most prominent type in a professional senior SAS programmer’s life’s production (the ''Oeuvre''), we will describe SAS Macro ('''''"MACRO"''''') coding as '''''1st segment type'''''.
 
 
 
As we remember from the '''''run group'' segment type''' example, code segments are verbatim encapsulated by an initializing statement that is accompanied by a corresponding termination statement. '''MACRO definitions''' are defined by using these two specific statements:
 
 
 
%MACRO name;
 
program code
 
%MEND name;
 
 
 
%NAME;
 
 
 
===Generalized Approach===
 
 
 
It appears necessary to stress here, that any MACRO does not execute the program code contained but passes it to the '''''SAS'' compiler''' which will perform a '''Compile-and-Go''' step as default. Nevertheless it would be premature to assume that this mechanism requires the code to be SAS code.
 
 
 
'''Instead, it is possible to GENERATE-and-PASS any code.'''
 
 
 
SAS provides means and concepts to direct generated source code to appropriate agents, be it external programs or the operating system itself. OS functions may be called explicitly or implicitly or code may be written to a text file that is executed later on.
 
 
 
Out of the numerous options, the following two might appear quite useful.
 
 
 
 
 
====Utilize OS Functions====
 
 
 
1. Access results from OS commands as data source.
 
 
 
filename myfref  pipe “dir c:\ /d”;
 
 
 
This statement assigns a file reference with target type ''pipe''. The pipe type dynamically accesses the result of an OS function as data stream that can be used as text input file inside a data step.
 
 
 
2. Perform an operation on OS level.
 
 
 
systask command “mkdir c:\&MYDIR.”;
 
 
 
The SYSTASK statement is a powerful means to initiate and control background tasks. With options WAIT/NOWAIT it provides direct utilization of OS multitasking by initiating parts of complex SAS code as background tasks.
 
 
 
 
 
====Write Vector Graphics====
 
 
 
filename _xml_ "&MYPTH.\&MYGPH..svg";
 
data _null_;
 
file _xml_;
 
put '<?xml version="1.0" standalone="no"?>';
 
put '<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">';
 
put '<svg xmlns="http://www.w3.org/2000/svg" version="1.1"
 
  width="29.7cm" height="21cm"
 
  viewBox="-200 -100 1200 800">
 
  <desc> Example anim01 - demonstrate animation elements
 
  </desc>
 
  <title> SpotGrid
 
  </title>
 
';
 
put '
 
  <rect id="OuterBorder" x="-4" y="-4" width="904" height="604" fill="rgb(255,255,255)" stroke="rgb(0,0,0)" stroke-width="8"> 
 
  </rect>
 
';
 
put '</svg>';
 
run;
 
 
 
 
 
===Advanced Coding===
 
 
 
While the crude approach to MACRO programming is 1st choice for any ad-hoc implementation it will never result in a piece of software that will survive in a quality controlled environment. Moreover, when used as component in a modularized system it will not produce predictable results and very likely mess things up at run-time.
 
 
 
'''Why?'''
 
 
 
The key reason is, that SAS languages – like other programming languages – do use variable properties, but without forcing the programmer to deliver this information by declaring everything forehand in a header section or similar place.
 
 
 
SAS code is executed regardless whether explicit declaration is found or not. When none is found, SAS applies built-in rules to perform automatic declaration on which it then operates. Properties given that way might not conform with programmers’ expectations or the system’s design requirements.
 
 
 
'''That’s why!'''
 
 
 
 
 
===Symbol Tables===
 
 
 
Control information tokens, referred to as ''parameters'' are called '''''macro variables''''' (''"variables"'') when writing MACROs. Macro variables are stored in tables, which have been given the name '''''symbol tables''''' (''"tables"'').
 
 
 
On starting a SAS process a '''''global symbol table''''' is initiated and populated with control information used by the session and non-MACRO programs.
 
 
 
On invocation of a MACRO, a '''''local symbol table''''' is initiated and kept alive during run-time of the MACRO. Local symbol tables disappear on termination of '''"their" MACRO'''.
 
 
 
Symbol tables are two-column character type matrices with '''one single property ''scope'' being either ''global'' or local'''''. MACRO variable names are stored in the 1st column, MACRO variable values are stored in the 2nd column.
 
 
 
'''MACRO symbol tables are stored in memory.'''
 
 
 
'''MACRO functions are processed in memory.'''
 
 
 
 
 
===Parameter Scope===
 
 
 
Since scope is the only property of MACRO variables, declaration is easy: Simply assign each variable used to one of these two groups.
 
 
 
However, there is a set of rules requiring your attention:
 
*A particular variable name may appear in an unlimited number of tables.
 
*MACROs may be nested to form unlimited invocation hierarchies.
 
*A calling MACRO’s local table appears global to the called MACRO.
 
*Read references to variables are performed 1st against the local table.
 
*Write references are processed likewise: local 1st, global 2nd.
 
*Write references not met in the invocation hierarchy generate a local variable.
 
 
 
 
 
'''Obviously variable declaration is a critical issue in a validated environment. If not done in total then the validation status of the whole system is questionable.'''
 
 
 
 
 
===Extending Control===
 
 
 
Now, with proper declaration, it is safe to run your MACRO as a component in a validated system. However, it is still difficult to follow its results and discover failure risks or un-wanted misbehavior.
 
 
 
You might therefore find it useful to add functionality such as:
 
#apply logic to check for parameters’ appropriate values
 
#navigate through the ecosystem by reading and processing metadata
 
#document workflow by writing comments to the LOG
 
#inform responsibles about invocation by sending an email
 
#write a text file that contains the plain code the MACRO generated
 
 
 
 
 
We will implement these requirements now step by step and thereby touch relevant parts of the '''so-called ''SAS Macro Facility'''''.
 
 
 
 
 
===Apply Logic===
 
 
 
Implementing MACRO logic is quite comparable to other languages, except that SAS requires so-called '''MACRO Triggers''' (''"TRIGGERS"''”) to direct processing to the appropriate subsystem inside the SAS ecosystem. These are:
 
 
 
'''''& – the ampersand'': indicates parameter reference'''
 
 
 
'''''% - the percent sign'': indicates syntax elements'''
 
 
 
TRIGGERS have been found necessary in the early history of SAS since the SAS Macro Facility was intended to perform text processing before code was sent to the SAS compiler. To invoke the text pre-processor every token is checked whether its 1st digit is a TRIGGER.
 
 
 
Of course the segment structure of coding also applies here:
 
 
 
'''%IF %LENGTH(&PARM_I.) ne 0 %THEN %DO;'''
 
''program code''
 
'''%END;'''
 
'''%ELSE %DO;'''
 
''alternate program code''
 
'''%END;'''
 
 
 
Depending on whether a value is supplied in parameter PARM_I either '''"program code"''' or '''"alternate program code"''' is passed to the SAS compiler for processing.
 
 
 
 
 
===What is Metadata?===
 
 
 
A widely used definition of metadata relies on two characteristics:
 
*primary control function vs. data content
 
*structured repository vs. one-dimensional parameter list
 
 
 
 
 
With these two pillars the definition does not correspond 100% to the denotation of meta-data, which is “data upon data”. Instead, metadata are transcribed as parameters used to
 
#define,
 
#control and
 
#integrate
 
the dynamic workflow of a software system comprised from autonomous modules.
 
 
 
 
 
Claiming storage in a structured repository is a powerful condition and allows for unlimited complexity of workflow logic and its control.
 
 
 
'''This metadata repository is called ''METABASE''.'''
 
 
 
Finally, data may easily change domains from DATABASE to METABASE and back, being data content in one phase and control information in another phase of system runtime.
 
 
 
 
 
===Process Metadata===
 
 
 
Although the following examples hold for every static METABASE design we will start over with consideration of dynamic metadata arrays that are read from database tables or datasets.
 
Using dynamic metadata arrays is commonly referred to as '''''DATA DRIVEN APPROACH''''' because processing depends on content of data tables.
 
 
 
 
 
Examples are based on the sample CLASS dataset and presentation will have three logical steps:
 
#generate the list itself
 
#obtain the number of elements
 
#utilize list elements 
 
 
 
 
 
To accomplish this task we will use Data Steps along with PROCs SQL and PRINT in order to sketch three approaches:
 
*a list with selected delimiters (''"List"'')
 
*a set of numbered parameters (''"Numbered"'')
 
*a single parameter that is used repeatedly (''"Direct"'')
 
 
 
 
 
====List (generate)====
 
 
 
The SQL procedure supports very nicely the creation of segmented parameter values from distinct values found for a specific variable in a data table. The target parameter name is indicated with a '''colon ''(":")''''' and the concatenation of values is initiated with the '''''separated by''''' option and a '''''by''''' value (or string) in the select statement.
 
 
 
/*
 
generate the list itself
 
*/
 
proc sql noprint;
 
select distinct age
 
  into :age_list separated by ' , '
 
  from sashelp.class
 
;
 
quit;
 
%LET age_list = %QUOTE(&AGE_LIST.);
 
 
 
Be aware, that the SQL procedure is terminated with a '''''QUIT'' not a ''RUN''''' statement. To comply with other SQL interpreters, RUN statements do not exist in PROC SQL syntax. Instead, the '''semicolon ''(";")''''' triggers processing of SQL statements instantly making each its own RUN GROUP.
 
 
 
 
 
====List (count)====
 
 
 
An automatic MACRO variable is used to get the next piece of information required. '''''SQLOBS''''' contains the number of loops necessary to read the lines of the input table, hence the number of elements that were written to the list.
 
 
 
To save the value it is written to a user owned variable with a LET statement.
 
 
 
/*
 
obtain the number of elements
 
*/
 
%LET age_grps = &SQLOBS.;
 
 
 
 
 
====List (utilize)====
 
 
 
With all the values in one segmented variable we now need to address each value by position. This is easily accomplished with the '''''SCAN'' function''' inside a loop that moves over positions from '''"1"''' to the '''"number of elements"''' that was copied from system owned parameter SQLOBS to user owned parameter AGE_GRPS.
 
 
 
/*
 
utilize list elements
 
*/
 
'''%DO age_indx = 1 %TO &AGE_GRPS.;'''
 
''proc print noobs''
 
''    data = sashelp.class''
 
'';''
 
''where age = %SCAN(&AGE_LIST.,&AGE_INDX.,',');''
 
''run;''
 
'''%END;'''
 
 
 
It might appear obvious now why it was necessary to QUOTE the segmented AGE_LIST parameter when it was list-structured with commas:
 
 
 
The MACRO Facility’s pre-processor property causes replacement (''"resolution"'') of the parameter before the function is executed which is likely to deliver more arguments than expected to the '''''SCAN'' function'''.
 
 
 
'''Quoting inside the SAS System is very much like a science on its own and will be revisited whenever appropriate.'''
 
 
 
 
 
====Numbered (generate)====
 
 
 
The numbered approach starts by storing the unique values in an intermediate table. The lines of this table are then assigned with the '''''SYMPUT'' call routine''' to numbered MACRO variables using the record number read from the '''automatic data step variable ''"_N_"'''''.
 
 
 
/*
 
generate the list itself
 
*/
 
proc sql noprint;
 
create table age_list as
 
select distinct age
 
  from sashelp.class
 
;
 
quit;
 
 
 
data _null_;
 
  set age_list;
 
call symput("age_grp"||left(put(_N_,8.)),trim(left(put(age,8.))));
 
run;
 
 
 
 
 
====Numbered (count)====
 
 
 
There is probably no less elegant way to determine the number of values but we will use it here for demonstration purposes.
 
 
 
Nevertheless, this method is definitely indicated when all lines read are used as does the code presented in step 2 before.
 
 
 
/*
 
obtain the number of elements
 
*/
 
data _null_;
 
  set age_list end = eof;
 
if eof then do;
 
call symput("age_grps",trim(left(put(_N_,8.))));
 
end;
 
run;
 
 
 
 
 
====Numbered (utilize)====
 
 
 
First, utilization seems pretty similar to the list approach, since the DO loop remains unchanged. However, the re-construction of parameter names from the loop variable AGE_INDX requires multiple ampersands.
 
 
 
The most simple, and yet correct, explanation says, that the ampersand is parameter name and value as well, i.e. the MACRO variable with name '''''"&"'' resolves to ''"&'''''.
 
 
 
Thus, the '''double ampersand use ''"&&"''''' produces all required parameter references starting with AGE_GRP1.
 
 
 
/*
 
utilize list elements
 
*/
 
'''%DO age_indx = 1 %TO &AGE_GRPS.;'''
 
''proc print noobs''
 
''    data = sashelp.class''
 
'';''
 
''where age = &&AGE_GRP&AGE_INDX.;''
 
''run;''
 
'''%END;'''
 
 
 
 
 
====Direct (generate)====
 
 
 
Partially, step 1 in direct approach looks quite familiar from the numbered approach except that no parameters are populated, neither a segmented list, nor a limited number of single values.
 
 
 
/*
 
generate the list itself
 
*/
 
proc sql noprint;
 
create table age_list as
 
select distinct age
 
  from sashelp.class
 
;
 
quit;
 
 
 
As indicated by the name, the direct approach leaves parameter values in the data table until needed. When necessary, the value is DIRECTLY read into a parameter and used instantly.
 
 
 
 
 
====Direct (count)====
 
 
 
Determination of the number of elements remains totally unchanged.
 
 
 
/*
 
obtain the number of elements
 
*/
 
data _null_;
 
  set age_list end = eof;
 
if eof then do;
 
call symput("age_grps",trim(left(put(_N_,8.))));
 
end;
 
run;
 
 
 
 
 
====Direct (utilize)====
 
 
 
As promised, the loop hosting the utilization of parameter values also populates the parameter directly from the data table.
 
 
 
This approach makes use of the SAS Systems capability for direct record access. Whereas default SAS data table access is performed sequentially, it is also possible to directly jump to a specific record and read until an also specified record number. When '''data set options ''FIRSTOBS'' and ''OBS''''' are set identical, then exactly one record is directly accessed and read.
 
 
 
/*
 
utilize list elements
 
*/
 
'''%DO age_indx = 1 %TO &AGE_GRPS.;'''
 
''data _null_;''
 
'' set age_list(firstobs = &AGE_INDX. obs = &AGE_INDX.);''
 
''call symput("age_grp",trim(left(put(age,8.))));''
 
''run;''
 
''proc print noobs''
 
''    data = sashelp.class''
 
'';''
 
''where age = &AGE_GRP.;''
 
''run;''
 
'''%END;'''
 
 
 
Obviously the DIRECT approach is 1st choice when the number of parameters exceeds limits imposed by LIST getting too long or oversight getting lost in the NUMBERED approach.
 
 
 
 
 
===Workflow Documentation===
 
 
 
====Open Code====
 
 
 
When executing open SAS code, i.e. outside a MACRO segment definition, documentation is done by the SAS LOG on three severity levels:
 
#'''Notes:''' Ordinary execution
 
#'''Warnings:''' Ambiguous code identified
 
#'''Errors:''' Execution cannot continue, hence stops
 
 
 
This makes SAS very convenient to the ad-hoc user, since execution is well documented and reasons for unexpected results or premature termination are delivered in deep detail.
 
 
 
Once you have reached the '''GEEK level in MACRO programming''' this service will appear childish and very soon will be more annoying than enlightening. As a consequence you will issue a statement similar to this one:
 
 
 
options nosource nonotes errors=0;
 
 
 
'''After that your LOG will remain silent, except for WARNINGS, which cannot be switched off.'''
 
 
 
====Macro code====
 
 
 
Since MACRO programmers are very well aware what their code is aimed to do, they usually decide on their own which information is written to the LOG. To get a proper documentation the '''''MACRO PUT'' statement''' does the job:
 
 
 
%PUT Now processing age group no. &AGE_INDX.;
 
 
 
%PUT Parameter ‘PARM_I’ is empty. Alternate code will execute;
 
 
 
Three MACRO specific options may be issued to document runtime behavior:
 
*'''mprint:''' Write generated code to the LOG.
 
*'''mlogic:''' Indicate branches taken or loop counters checked to the LOG.
 
*'''mfile:''' Write generated code to the fileref MPRINT.
 
 
 
 
 
===Realtime Information===
 
 
 
====Mail tags====
 
 
 
Depending on how structured your programming approach is and moreover, how close you want to be to program activity, you might consider using '''e-mail as instant documentation vehicle'''.
 
 
 
SAS supports instant e-mail generation very well by using the MAPI interface definition that is automatically installed on every machine having access to a default mail gateway.
 
 
 
filename &MYMAILREF. EMAIL 'addressee@domain';
 
 
 
However easy-going this looks, you will get no result from the example above except an empty mail. Of course, you need to provide '''mail-tags according to ''RFC-822'''''  <http://tools.ietf.org/html/rfc822>, like '''"To:"''', '''"Subject:"''', etc. and, if appropriate, the name and location of attachments.
 
 
 
filename &MYMAILREF. EMAIL 'addressee@domain'
 
subject = “Interim results from running program at &SYSTIME.”
 
attach = (“&MYPATH.\results.png” “&OTHERPATH.\tables.pdf”)
 
;
 
 
 
The 1st address given in quotes is used as '''''"To:"'' tag'''. A 2nd (explicit) '''''"To:"'' tag''' would overwrite the 1st one. In addition, tags may be populated with single values as well as with a blank separated list of quoted values set in brackets.
 
 
 
 
 
====Mail body====
 
 
 
As soon as the mail vehicle has been prepared as text storage location with special attributes ('''"tags"''') it is ready to receive body text.
 
 
 
data _NULL_;
 
file &MYMAILREF.;
 
put @1 “Dear Colleagues,”
 
  #3 @5 “please find attached …”
 
;
 
run;
 
 
 
Adhering to the code segment concept, we finally close the destination file by:
 
 
 
filename &MYMAILREF. clear;
 
 
 
 
 
[[Datei:Duck_zazy_com.png|frameless|100px|thumb|left|To be continued]]
 

Version vom 5. März 2014, 13:38 Uhr