tags:

views:

936

answers:

4

So now that I have a large bit of XML data I'm interested in:

http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump

I'd like to load this into Oracle to play with.

How can I directly load a large XML file directly into Oracle? Server-side solutions (where the data file can be opened on the server) and client-side solutions welcomed.

Here's a bit of badges.xml for a concrete example.

<?xml version="1.0" encoding="UTF-8" ?>
  <badges>
  <row UserId="3718" Name="Teacher" Date="2008-09-15T08:55:03.923"/>
  <row UserId="994" Name="Teacher" Date="2008-09-15T08:55:03.957"/>
  ...
A: 

I would do a simple:

grep '<row' file.xml |\
gawk -F '"' '{printf("insert into badges(userid,name,date) values (\"%s\",\"%s\",\"%s\");\n",$2,$4,$6); } > request.sql

or you can create A java program using a SAX parser. Each time your handler finds a new Element 'row', you get the attributes and insert a new record in your database.

Pierre
+5  A: 

You can access the XML files on the server via SQL. With your data in the /tmp/tmp.xml, you would first declare the directory:

SQL> create directory d as '/tmp';

Directory created

You could then query your XML File directly:

SQL> SELECT XMLTYPE(bfilename('D', 'tmp.xml'), nls_charset_id('UTF8')) xml_data
  2    FROM dual;

XML_DATA
--------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<badges>
  [...]

To access the fields in your file, you could use the method described in another SO for example:

SQL> SELECT UserId, Name, to_timestamp(dt, 'YYYY-MM-DD"T"HH24:MI:SS.FF3') dt
  2    FROM (SELECT XMLTYPE(bfilename('D', 'tmp.xml'), 
                            nls_charset_id('UTF8')) xml_data
  3            FROM dual),
  4         XMLTable('for $i in /badges/row
  5                              return $i'
  6                  passing xml_data
  7                  columns UserId NUMBER path '@UserId',
  8                          Name VARCHAR2(50) path '@Name',
  9                          dt VARCHAR2(25) path '@Date');

    USERID NAME       DT                         
---------- ---------- ---------------------------
      3718 Teacher    2008-09-15 08:55:03.923    
       994 Teacher    2008-09-15 08:55:03.957
Vincent Malgrat
A: 

Seems like you're talking about 2 issues -- first, getting the XML document to where Oracle can see it. And then maybe making it so that standard relational tools can be applied to the data.

For the first, you or your DBA can create a table with a BLOB, CLOB, or BFILE column and load the data. If you have access to the server on which the database lives, you can define a DIRECTORY object in the database that points to an operating system directory. Then put your file there. And then either set it up as a BFILE or read it in. (CLOB and BLOB store in the database; BFILE stores a pointed to a file on the operating system side).

Alternatively , use some tool that will let you directly write CLOBs to the database. Anyway, that gets you to the point where you can see the XML instance document in the database.

So now you have the instance document visible. Step 1 is done.

Depending on the version, Oracle has some pretty good tools for shredding the XML into relational tables.

It can be pretty declarative. While this gets beyond what I've actually done (I have a project where I'll be trying it this fall), you can theoretically load your XML Schema into the database and annotate it with the crosswalk between the relational tables and the XML. Then take your CLOB or BFILE and convert it to an XMLTYPE column with the defined schema and you're done -- the shredding happens automatically, the data is all there, it's all relational, it's all available to standard SQL without the XQUERY or XML extensions.

Of course, if you'd rather use XQUERY, then just take the CLOB or BFILE, convert it to an XMLTYPE, and go for it.

Jim Hudson
There was actually a good blog entry by Marco Gralike today on shredding in the database, http://www.liberidu.com/blog/?p=1094
Jim Hudson
A: 

Vincent -

why in your script below you use the '@' in the path clause? SELECT UserId, Name, to_timestamp(dt, 'YYYY-MM-DD"T"HH24:MI:SS.FF3') dt
FROM (SELECT XMLTYPE(bfilename('D', 'tmp.xml'), nls_charset_id('UTF8')) xml_data FROM dual), XMLTable('for $i in /badges/row return $i' passing xml_data columns UserId NUMBER path '@UserId', Name VARCHAR2(50) path '@Name', dt VARCHAR2(25) path '@Date');

I like the idea, but please explain.

Jenny
If you have a follow up question about Sql syntax, you should post it as a new question, not as an answer here. The "Ask Question" button is in the top right. You of course can always link back to this page for reference if you want to.
sth