tags:

views:

62

answers:

3
Using IBM DB2 database, I have a three relational tables:

    Project: id, title, description
    Topic: projectId, value
    Tag: projectId, value

I need to produce the following XML file from the previous table: 

    <projects>
      <project id="project1">
        <title>title1</title>
        <description>desc1</description>
        <topic>topic1</topic>
        <topic>topic2</topic>
        <tag>tag1</tag>
        <tag>tag2</tag>
        <tag>tag3</tag>  
      </project>
      ...
    </projects>


I've tried the following query, and it works:

    XQUERY 
    let $projects := db2-fn:sqlquery('SELECT XMLELEMENT(NAME "project", XMLATTRIBUTES(id, title, description)) AS project FROM mydb.Project')  
    let $TopicSet := db2-fn:sqlquery('SELECT XMLELEMENT(NAME "row", XMLATTRIBUTES(projectId, value)) FROM mydb.Topic')
    let $TagSet := db2-fn:sqlquery('SELECT XMLELEMENT(NAME "row", XMLATTRIBUTES(projectId, value)) FROM mydb.Tag')

    for $project in $projects return
    <project>
    {$project/@ID}
    <title>{$project/fn:string(@TITLE)}</title>
    <description>{$project/fn:string(@DESCRIPTION)}</description>
    {for $row in $TopicSet[@PROJECTID=$project/@ID] return <Topic>{$row/fn:string(@VALUE)}</Topic>}
    {for $row in $TagSet[@PROJECTID=$project/@ID] return <Tag>{$row/fn:string(@VALUE)}</Tag>}
    </project>
    ;

However, it took 9 hours to complete (there 200k projects in the table)

How can I improve that?
Do I really need to create the three intermediate db2-fn:sqlquery to achieve this? is there another way?
Would it be faster if I create these 3 three intermediate db2-fn:sqlquery and put them in a table (with only one row and one attribute), and then index this before querying the "for $project in $projects return" part?


Or, how would you proceed to achieve my goal?


Best regards,
David

---
As proposed by Peter Schuetze, I tried the XMLAGG as follows:
SELECT 
XMLSERIALIZE(
  XMLDOCUMENT(
    XMLELEMENT(
      NAME "Project",
      XMLATTRIBUTES(P.project), 
      XMLAGG(XMLELEMENT(NAME "Topic", Topic.value)),
      XMLAGG(XMLELEMENT(NAME "Tag", Tag.value)),
    ) 
  ) AS CLOB(1M)
) 
FROM mydb.project P 
LEFT JOIN mydb.Topic Topic ON (P.project = Topic.project) 
LEFT JOIN mydb.Tag Tag ON (P.project = Tag.project) 
GROUP BY P.project;

This works indeed much much faster!
However, if a project has not any topic, it will still display topic element, with a blank text, such as:
    <projects>
      <project id="project1">
        <title>title1</title>
        <description>desc1</description>
        <topic></topic>
        <tag>tag1</tag>
        <tag>tag2</tag>
        <tag>tag3</tag>  
      </project>
      ...
    </projects>
How to remove this "<topic></topic>"?
A: 

Look at the XMLAGG function. This should be perfect for your need. I haven't tried it yet but the example on the linked page is almost exactly what you want to do.

Peter Schuetze
Thanks, it works much faster like this.However, now there appear blank elements when there are not topics.(see the edited question above).Any idea?
David Portabella
A: 

Use XMLFOREST instead of XMLELEMENT if there is the possibility that the column could be NULL and you don't want an empty element tag when that happens. So, for topics, you'd replace its XMLELEMENT function with

XMLFOREST( Topic.value AS "topic" )

There is a problem with the way you've included two XMLAGG functions in the same SELECT statement. If you have just one XMLAGG in your statement, there's no problem, since the GROUP BY on the parent key will neatly collapse the child entries that are specified inside XMLAGG. However, when you specify more than one XMLAGG function inside the same SELECT, the query produces a Cartesian product internally, so in this case, you'll see repeating items inside each group returned by XMLAGG. The example you gave with there being only zero or one topics for a project does not demonstrate this problem, but if a project had two topics and three tags, you'd see each topic repeated three times, and each tag repeated twice. To prevent this, you'll need to relocate each XMLAGG to a subquery or a common table expression that produces a single XML fragment so you can safely reference it from the main query.

Below is an example that pushes the XMLAGG down into common table expressions. It also gets rid of the need for XMLFOREST, since XMLAGG won't produce any results for an empty input set.

WITH 
topicxml( projectid, xmlfragment ) AS (
SELECT topic.projectid, 
XMLAGG( XMLELEMENT( NAME "topic", topic.value ) ORDER BY topic.value)
FROM mydb.topic topic 
GROUP BY topic.projectid
),
tagxml ( projectid, xmlfragment ) AS (
SELECT projectid, 
XMLAGG( XMLELEMENT( NAME "tag", tag.value ) ORDER BY tag.value)
FROM mydb.tag tag
GROUP BY tag.projectid
)
SELECT XMLSERIALIZE ( CONTENT XMLELEMENT( NAME "project",
XMLATTRIBUTES( p.id AS "id" ),
XMLELEMENT( NAME "title", p.title ),
XMLELEMENT( NAME "description", p.description ),
XMLCONCAT( topicxml.xmlfragment, tagxml.xmlfragment )
) AS VARCHAR(2000) )
FROM mydb.project p
LEFT OUTER JOIN topicxml ON topicxml.projectid = p.id
LEFT OUTER JOIN tagxml ON tagxml.projectid = p.id
;
Fred Sobotka
This query runs super fast,and it completely solves my question, thanks!
David Portabella
A: 

Fred, I tried with XMLFOREST( Topic.value AS "topic" ), but I get the following error: ++++ SQL0119N An expression starting with "CONTENT" specified in a SELECT clause, HAVING clause, or ORDER BY clause is not specified in the GROUP BY clause or it is in a SELECT clause, HAVING clause, or ORDER BY clause with a column function and no GROUP BY clause is specified. SQLSTATE=42803 ++++ That is, XMLFOREST cannot aggregate multiple values from a one row and column (as XMLAGG does). Thus I think it is not suited for this case.

I see your point about the cartesian product, you are right. I am stack at having a subquery. I've tried as follows:

SELECT 
XMLSERIALIZE(
  XMLDOCUMENT(
    XMLELEMENT(
      NAME "Project",
      XMLATTRIBUTES(P.project),
      XMLAGG( (SELECT XMLELEMENT(NAME "Topic", content) FROM Topic WHERE project=P.project) )
    ) 
  ) AS CLOB(1M)
) 
FROM Project P;

but it complains with the same error above.

any idea? I guess the best would be to create an XQuery. The first one works, but it is just too slow. maybe an idea to get it faster?

David Portabella
Don't use XQuery; you've already determined it's too slow, and it is intended for data that is already in XML. I use <a href="http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/topic/com.ibm.db2.luw.sql.ref.doc/doc/c0004116.html">common table expressions</a> for that sort of task so I don't have to resolve multiple conflicting GROUP BY issues in the same SELECT.
Fred Sobotka
I've updated my original answer to include a sample that demonstrates how CTEs can handle the XMLAGG separately from the main SELECT.
Fred Sobotka