tags:

views:

949

answers:

2

I need to control the data type when reading XML data in SAS. The XML data are written and accessed using the XML libname engine in SAS.

SAS seems to guess the data type based on the contents of a column: If I write "20081002" to my XML data in a character column, it will be read back in as a numerical variable.

An example:

filename my_xml '/tmp/my.xml'; * Yes, I use SAS on Unix *;
libname my_xml XML;

data my_xml.data_type_test;
  text_char="This is obviously text";
  date_char="20081002";
  num_char="42";
  genuine_num=42;
run;

proc copy inlib=my_xml outlib=WORK;
run;

libname my_xml;
filename my_xml CLEAR;

Only the last column is defined as numerical data type in the XML data, but when I copy it into my WORK library, only the column *text_char* is character. The other 3 are now numeric.

How can I control the data type when reading XML data in SAS?

+1  A: 

I think you need to define some xml specific options whith your libname XML statement for export go:

libname my_xml_out XML XMLMETA=SCHEMADATA;

To include the data schema. Also, you might want to save the XML schema to a separate file for later import:

libname my_xml_in XML XMLSCHEMA='external-file'

after you exported the schema using XMLMETA=SCHEMA of course. I think this is the documentation you need.

Apart from that liberal use of format statements on original dataset creation is recommended.

jilles de wit
Thanks for the input. Unfortunately, I can't get it to work - SAS still claims that my character data that look like numbers, are of the NUM data type, when read from XML using XMLSCHEMA.
Martin Bøgelund
+3  A: 

Take a look at the SAS XML Mapper. It allows you to create a map to read (and wrte in 9.2) XML files and specifying column attributes.

If this is your XML file:

This is obviously text 20081002 42 42

You could create a MAP like this:

<!-- ############################################################ -->
<TABLE name="DATA_TYPE_TEST">
    <TABLE-PATH syntax="XPath">/TABLE/DATA_TYPE_TEST</TABLE-PATH>

    <COLUMN name="text_char">
        <PATH syntax="XPath">/TABLE/DATA_TYPE_TEST/text_char</PATH>
        <TYPE>character</TYPE>
        <DATATYPE>string</DATATYPE>
        <LENGTH>22</LENGTH>
    </COLUMN>

    <COLUMN name="date_char">
        <PATH syntax="XPath">/TABLE/DATA_TYPE_TEST/date_char</PATH>
        <TYPE>numeric</TYPE>
        <DATATYPE>integer</DATATYPE>
        <FORMAT width="9">DATE</FORMAT>
        <INFORMAT width="8">ND8601DA</INFORMAT>
    </COLUMN>

    <COLUMN name="num_char">
        <PATH syntax="XPath">/TABLE/DATA_TYPE_TEST/num_char</PATH>
        <TYPE>character</TYPE>
        <DATATYPE>string</DATATYPE>
        <LENGTH>2</LENGTH>
    </COLUMN>

    <COLUMN name="genuine_num">
        <PATH syntax="XPath">/TABLE/DATA_TYPE_TEST/genuine_num</PATH>
        <TYPE>numeric</TYPE>
        <DATATYPE>integer</DATATYPE>
    </COLUMN>

</TABLE>

And then read the XML file:

filename  my 'C:\temp\my.xml';
filename  SXLEMAP 'C:\temp\MyMap.map';
libname   my xml xmlmap=SXLEMAP access=READONLY;

title 'Table DATA_TYPE_TEST';
proc contents data=my.DATA_TYPE_TEST varnum; 
run;
proc print data=my.DATA_TYPE_TEST(obs=10); 
run;

Result:

Table DATA_TYPE_TEST

The CONTENTS Procedure

Data Set Name        MY.DATA_TYPE_TEST    Observations            
Member Type          DATA                 Variables             4 
Engine               XML                  Indexes               0 
Created              .                    Observation Length    0 
Last Modified        .                    Deleted Observations  0 
Protection                                Compressed            NO
Data Set Type                             Sorted                NO
Label                                                             
Data Representation  Default                                      
Encoding             Default                                      


Variables in Creation Order

#    Variable       Type    Len    Format    Informat      Label

1    text_char      Char     22    $22.      $22.          text_char  
2    date_char      Num       8    DATE9.    ND8601DA8.    date_char  
3    num_char       Char      2    $2.       $2.           num_char   
4    genuine_num    Num       8    F8.       F8.           genuine_num

Table DATA_TYPE_TEST

                                                               genuine_
     Obs    text_char                 date_char    num_char      num

       1    This is obviously text    02OCT2008       42             42
Lex