In the last two posts we looked at how to clean the log by removing SAS NOTEs, here we look again at what to do when your log contains the following message:
NOTE: Character values have been converted to numeric values.
This message occurs when you try to perform a numeric calculation using a character variable, or to assign a character value to a variable which has already been defined as numeric.
It is possible to rely on SAS to perform these conversions imlpicitly, however, in general this is not considered good programming practice and it is usually preferable to perform these conversions explicitly. Implicit conversions also take more CPU time than explicit conversions and so may cause your program to execute more slowly.
Here we look at a few example where this implicit conversion is performed and look at ways in which the programs can be updated to avoid the NOTE.
Consider for example the following DATA step. A character variable WEIGHT_KG is created and assigned the value “75″, the subsequent statement attempts to use this character variable in a numeric calculation. In this instance SAS will perform an implicit conversion (i.e. SAS automatically converts the variable from character to numeric so that the calculation can take place). Here the character variable “75″ is converted to a numeric variable and the statement executes giving the expected result, but writing a NOTE to the log explaining that SAS has had to perform an implicit conversion in order for the calculation to take place.
DATA weight; weight_kg = "75"; weigth_st = 0.157473 * weight_kg; RUN; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 35:26
To avoid the note being written to the log, you can explicitly perform the character to numeric conversion.
Going back to our example, this would involve using an INPUT function to convert the variable WEIGHT_KG to a numeric before the calculation takes place as follows:
DATA weight; weight_kg = "75"; weigth_st = 0.157473 * INPUT(weight_kg,BEST.); RUN;
Care should be taken with this approach however, in case the character variable contains values unsuitable for a numeric conversion. Consider the following example:
DATA vitals; weight = "70.5"; OUTPUT; weight = "U"; OUTPUT; RUN; DATA vitals2; SET vitals; weight_rnd = ROUND(INPUT(weight,BEST.)); RUN;
The second observation of this dataset would cause _ERROR_ variable to be set to 1 as the INPUT function tries to convert the character value “U” to a number. In this instance the log will show something similar to the following:
NOTE: Invalid argument to function INPUT at line 36 column 22. weight=U weight_rnd=. _ERROR_=1 _N_=2
Indicating that at observation 2 the variable WEIGHT contained a value “U” which was unsiotable for the INPUT function.
If the value of “U” is unexpected and perhaps indicates dirty data, then the best approach is usually to request that the input dataset is queried. If however the value is feasible, perhaps indicating that the test was not perofmed in this instance, then conditional programming can be used to indicate that the character to numeric conversion should only be performed on suitable input values, for example:
In the below example we use the ?? format modifier to first test to see whether the character variable contains a value which is suitable for conversion to a numeric, and then only perform the conversion if this is the case, otherwise a custom warning is written to the log indicating that the input data is not clean and that this should be investigated.
DATA vitals2; SET vitals; IF INPUT(weight,??BEST.) NE . THEN weight_rnd = ROUND(INPUT(weight,BEST.)); ELSE IF weight NE "U" THEN PUT "WARN" "ING: the following non numeric values were found in the WEIGHT variable: " weight "at observation:" _n_; RUN;