This week our SAS paper review looks at “TIPS AND TRICKS OF EFFICIENT SAS® PROGRAMMING FOR SDTM DATA ” by Eric Qi and Fikret Karahoda of Merck & Co.
The paper examines problems which can result from processing and storing large SDTM datasets and how to resolve them. Although this paper was written in 2010, we feel that the issues and techniques discussed in this paper are particularly important in the light of the guidance provided by CDISC (SDTM Implementation Guide 3.2 – section 184.108.40.206) that:
Very large transport files have become an issue for FDA to process. One of the main contributors to the large file sizes has been sponsors using the maximum length of 200 for character variables. To help rectify this situation:
• The maximum SAS Version 5 character variable length of 200 characters should not be used unless necessary.
The paper presents several methods for handling large datasets more efficiently and suggests ways in which the size of datasets can be reduced. We particularly liked the tables which show real time, CPU time and memory costs associated with different approaches, which provide a tangible illustration of the benefits of efficiency savings.
The paper was presented at the SESUG conference, Savannah, GA, 2010. Download the paper here.