Using IBM DB2 database, I have a three relational tables:
Project: id, title, description
Topic: projectId, value
Tag: projectId, value
I need to produce the following XML file from the previous table:
<projects>
<project id="project1">
<title>title1</title>
<description>desc1</description>
<topic>topic1</topic>
<topic>topic2</topic>
<tag>tag1</tag>
<tag>tag2</tag>
<tag>tag3</tag>
</project>
...
</projects>
I've tried the following query, and it works:
XQUERY
let $projects := db2-fn:sqlquery('SELECT XMLELEMENT(NAME "project", XMLATTRIBUTES(id, title, description)) AS project FROM mydb.Project')
let $TopicSet := db2-fn:sqlquery('SELECT XMLELEMENT(NAME "row", XMLATTRIBUTES(projectId, value)) FROM mydb.Topic')
let $TagSet := db2-fn:sqlquery('SELECT XMLELEMENT(NAME "row", XMLATTRIBUTES(projectId, value)) FROM mydb.Tag')
for $project in $projects return
<project>
{$project/@ID}
<title>{$project/fn:string(@TITLE)}</title>
<description>{$project/fn:string(@DESCRIPTION)}</description>
{for $row in $TopicSet[@PROJECTID=$project/@ID] return <Topic>{$row/fn:string(@VALUE)}</Topic>}
{for $row in $TagSet[@PROJECTID=$project/@ID] return <Tag>{$row/fn:string(@VALUE)}</Tag>}
</project>
;
However, it took 9 hours to complete (there 200k projects in the table)
How can I improve that?
Do I really need to create the three intermediate db2-fn:sqlquery to achieve this? is there another way?
Would it be faster if I create these 3 three intermediate db2-fn:sqlquery and put them in a table (with only one row and one attribute), and then index this before querying the "for $project in $projects return" part?
Or, how would you proceed to achieve my goal?
Best regards,
David
---
As proposed by Peter Schuetze, I tried the XMLAGG as follows:
SELECT
XMLSERIALIZE(
XMLDOCUMENT(
XMLELEMENT(
NAME "Project",
XMLATTRIBUTES(P.project),
XMLAGG(XMLELEMENT(NAME "Topic", Topic.value)),
XMLAGG(XMLELEMENT(NAME "Tag", Tag.value)),
)
) AS CLOB(1M)
)
FROM mydb.project P
LEFT JOIN mydb.Topic Topic ON (P.project = Topic.project)
LEFT JOIN mydb.Tag Tag ON (P.project = Tag.project)
GROUP BY P.project;
This works indeed much much faster!
However, if a project has not any topic, it will still display topic element, with a blank text, such as:
<projects>
<project id="project1">
<title>title1</title>
<description>desc1</description>
<topic></topic>
<tag>tag1</tag>
<tag>tag2</tag>
<tag>tag3</tag>
</project>
...
</projects>
How to remove this "<topic></topic>"?
views:
62answers:
3Look at the XMLAGG function. This should be perfect for your need. I haven't tried it yet but the example on the linked page is almost exactly what you want to do.
Use XMLFOREST instead of XMLELEMENT if there is the possibility that the column could be NULL and you don't want an empty element tag when that happens. So, for topics, you'd replace its XMLELEMENT function with
XMLFOREST( Topic.value AS "topic" )
There is a problem with the way you've included two XMLAGG functions in the same SELECT statement. If you have just one XMLAGG in your statement, there's no problem, since the GROUP BY on the parent key will neatly collapse the child entries that are specified inside XMLAGG. However, when you specify more than one XMLAGG function inside the same SELECT, the query produces a Cartesian product internally, so in this case, you'll see repeating items inside each group returned by XMLAGG. The example you gave with there being only zero or one topics for a project does not demonstrate this problem, but if a project had two topics and three tags, you'd see each topic repeated three times, and each tag repeated twice. To prevent this, you'll need to relocate each XMLAGG to a subquery or a common table expression that produces a single XML fragment so you can safely reference it from the main query.
Below is an example that pushes the XMLAGG down into common table expressions. It also gets rid of the need for XMLFOREST, since XMLAGG won't produce any results for an empty input set.
WITH topicxml( projectid, xmlfragment ) AS ( SELECT topic.projectid, XMLAGG( XMLELEMENT( NAME "topic", topic.value ) ORDER BY topic.value) FROM mydb.topic topic GROUP BY topic.projectid ), tagxml ( projectid, xmlfragment ) AS ( SELECT projectid, XMLAGG( XMLELEMENT( NAME "tag", tag.value ) ORDER BY tag.value) FROM mydb.tag tag GROUP BY tag.projectid ) SELECT XMLSERIALIZE ( CONTENT XMLELEMENT( NAME "project", XMLATTRIBUTES( p.id AS "id" ), XMLELEMENT( NAME "title", p.title ), XMLELEMENT( NAME "description", p.description ), XMLCONCAT( topicxml.xmlfragment, tagxml.xmlfragment ) ) AS VARCHAR(2000) ) FROM mydb.project p LEFT OUTER JOIN topicxml ON topicxml.projectid = p.id LEFT OUTER JOIN tagxml ON tagxml.projectid = p.id ;
Fred, I tried with XMLFOREST( Topic.value AS "topic" ), but I get the following error: ++++ SQL0119N An expression starting with "CONTENT" specified in a SELECT clause, HAVING clause, or ORDER BY clause is not specified in the GROUP BY clause or it is in a SELECT clause, HAVING clause, or ORDER BY clause with a column function and no GROUP BY clause is specified. SQLSTATE=42803 ++++ That is, XMLFOREST cannot aggregate multiple values from a one row and column (as XMLAGG does). Thus I think it is not suited for this case.
I see your point about the cartesian product, you are right. I am stack at having a subquery. I've tried as follows:
SELECT
XMLSERIALIZE(
XMLDOCUMENT(
XMLELEMENT(
NAME "Project",
XMLATTRIBUTES(P.project),
XMLAGG( (SELECT XMLELEMENT(NAME "Topic", content) FROM Topic WHERE project=P.project) )
)
) AS CLOB(1M)
)
FROM Project P;
but it complains with the same error above.
any idea? I guess the best would be to create an XQuery. The first one works, but it is just too slow. maybe an idea to get it faster?