SAS tips & tricks #11 – PROC SQL; INSERT INTO

In this tips & tricks post we look at how PROC SQL INSERT INTO provides a concise method for inserting additional rows into a dataset.

Sometimes it is necessary to force additional observations into your datasets, inserting new observations into a dataset with the Data Step usually involves either creating ta new dataset containing the new observations using a  DATALINES statement and then appending this dataset with the original, or alternatively  using multiple OUTPUT statements  within a Data Step.

 DATALINES method

DATA class_in;
  ATTRIB name LENGTH = $8 sex LENGTH = $1;
  INFILE DATALINES;
  INPUT name sex age height weight;
  DATALINES;
Andy M 30 190 100
Ben M 60 180 110
Paul M 26 170 120
;
RUN;
PROC APPEND BASE = class DATA = class_in;
RUN;

Multiple OUTPUT statement method

DATA class;
  SET class;
  OUTPUT;
  name = "Andy";
  sex  = "M";
  age = 30;
  height = 190;
  weight = 100;
  OUTPUT; 
  name = "Ben";
  sex  = "M";
  age = 60;
  height = 180;
  weight = 110;
  OUTPUT; 
  name = "Paul";
  sex  = "M";
  age = 26 ;
  height = 170 ;
  weight = 120;
  OUTPUT; 
RUN;

 

PROC SQL INSERT INTO method

The DATALINES method is not ideal as it involves creating an additional dataset and ensuring that the variable lengths in the new dataset  are consistent with the length in the original dataset, otherwise truncation can occur. Whereas the Multiple OUTPUT statement method can be time consuming and error prone because it required one assignment statement to be manually typed out for each variable in each observation. The PROC SQL INSERT INTO approach provides a concise alternative to these methods. This approach inserts lines using the VALUES statement in combination with a comma separated list, as shown below:

PROC SQL NOPRINT ; 
  INSERT INTO class
  VALUES ("Andy", "M", 30,190,100)
  VALUES ("Ben", "M", 60,180,110)
  VALUES ("PAUL", "M", 26,170,120); 
QUIT;

If using this approach, care should be taken to ensure that the order of the values in the comma separated list is consistent with the order of the associated variable in the dataset. To minimise the possibility of making a mistake in the order, or if you want to populate certain columns only, the columns which you wish to update, and the order of the variables in your comma separated list can be specified in brackets after the name of the dataset, as shown below:

PROC SQL; 
  INSERT INTO class (name,weight)
  VALUES ("Andy",100)
  VALUES ("Ben",110)
  VALUES ("PAUL",.); 
QUIT;

Leave your comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>