In this tips & tricks post we look at how PROC SQL INSERT INTO provides a concise method for inserting additional rows into a dataset.
Sometimes it is necessary to force additional observations into your datasets, inserting new observations into a dataset with the Data Step usually involves either creating ta new dataset containing the new observations using a DATALINES statement and then appending this dataset with the original, or alternatively using multiple OUTPUT statements within a Data Step.
DATA class_in; ATTRIB name LENGTH = $8 sex LENGTH = $1; INFILE DATALINES; INPUT name sex age height weight; DATALINES; Andy M 30 190 100 Ben M 60 180 110 Paul M 26 170 120 ; RUN; PROC APPEND BASE = class DATA = class_in; RUN;
Multiple OUTPUT statement method
DATA class; SET class; OUTPUT; name = "Andy"; sex = "M"; age = 30; height = 190; weight = 100; OUTPUT; name = "Ben"; sex = "M"; age = 60; height = 180; weight = 110; OUTPUT; name = "Paul"; sex = "M"; age = 26 ; height = 170 ; weight = 120; OUTPUT; RUN;
PROC SQL INSERT INTO method
The DATALINES method is not ideal as it involves creating an additional dataset and ensuring that the variable lengths in the new dataset are consistent with the length in the original dataset, otherwise truncation can occur. Whereas the Multiple OUTPUT statement method can be time consuming and error prone because it required one assignment statement to be manually typed out for each variable in each observation. The PROC SQL INSERT INTO approach provides a concise alternative to these methods. This approach inserts lines using the VALUES statement in combination with a comma separated list, as shown below:
PROC SQL NOPRINT ; INSERT INTO class VALUES ("Andy", "M", 30,190,100) VALUES ("Ben", "M", 60,180,110) VALUES ("PAUL", "M", 26,170,120); QUIT;
If using this approach, care should be taken to ensure that the order of the values in the comma separated list is consistent with the order of the associated variable in the dataset. To minimise the possibility of making a mistake in the order, or if you want to populate certain columns only, the columns which you wish to update, and the order of the variables in your comma separated list can be specified in brackets after the name of the dataset, as shown below:
PROC SQL; INSERT INTO class (name,weight) VALUES ("Andy",100) VALUES ("Ben",110) VALUES ("PAUL",.); QUIT;