In our last SAS tips & tricks blog post (statskom.com/sas-tips-tricks-5-/) we looked at how to clean up the log by removing the “NOTE: Missing values were generated..” message. In this post we continue the theme by looking at another commonly occurring SAS message and how it can be resolved.

In this post we look at the SAS log message: “NOTE: Missing values were generated as a result of performing an operation on missing values.”, its causes and how to resolve it. This message is typically caused by attempting to perform a calculation on a numeric variable which is missing, its technical name is the propogation of missing values, since a missing input value causes a missing output value.

Consider the following vital signs data where we have three test results for pulse and two for heart rate.

DATA vitals; ATTRIB pulse1 pulse2 pulse3 hr1 hr2 format = best.; INPUT pulse1 pulse2 pulse3 hr1 hr2; INFILE cards; DATALINES; 60 70 65 71 . 65 . 70 73 74 63 . . 72 73 65 77 73 74 75; RUN;

Notice that these test results are sometimes missing, now if we were to run a DATA Step similar to the below

DATA vitals2; SET vitals; pulse_sum=pulse1+pulse2+pulse3; hr_sum = hr1 + hr2; RUN;

We would result in messages similar to the following being written to the SAS log:

79 DATA vitals2;

80 SET vitals;

81 pulse_sum=pulse1+pulse2+pulse3;

82 hr_avg = (hr1 + hr2)/2;

83 RUN;

NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 2 at 81:19 1 at 82:16

These messages tell you that the statement in line 81 of the log (81 pulse_sum=pulse1+pulse2+pulse3;) resulted in a missing values twice (2 at 81:19 )whereas the statement from line 82 (82 hr_avg = (hr1 + hr2;)/2 ) resulted in a missing value once (1 at 82:16) and a PROC PRINT of the results would show the following:

Obs pulse1 pulse2 pulse3 hr1 hr2 pulse_sum hr_sum 1 60 70 65 71 . 195 . 2 65 . 70 73 74 . 73.5 3 63 . . 72 73 . 72.5 4 65 77 73 74 75 215 74.5

Note that as described in the log, the variable PULSE_SUM is missing twice (observations 2 and 3) whilse the HR_SUM variable is missing once (observation 1).

How you decide to deal with this message depends on whether you expect the variables in your input data to contain missing. If you do then you can omit missing values from your calculations by using the sample statistic functions. In this case we would use the MEAN and SUM functions as follows:

DATA vitals2; SET vitals; pulse_sum=SUM(pulse1,pulse2,pulse3); hr_avg = MEAN(hr1,hr2); RUN;

Printing out the dataset gives the following:

Obs pulse1 pulse2 pulse3 hr1 hr2 pulse_sum hr_avg 1 60 70 65 71 . 195 71.0 2 65 . 70 73 74 135 73.5 3 63 . . 72 73 63 72.5 4 65 77 73 74 75 215 74.5

Note that here PULSE_SUM is populated in every observation and is calculated as the sum of all available PULSE variables. HR_AVG is also populated in every observation and is calculated as the sum of all available HR variables divided by the number of available HR variables.

If however your data should not contain any missing values, then you could alternatively use the NMISS function to test whether your variables contain missing values and output a message to the log if they do.

More information on the mean and sum functions can be found in the Functions and Call Routines section of the SAS doc