ansaurus

Question

How to write an xquery containing sequence elements?

Answer 1

+1 A:

with cte_entertime as (
SELECT
    [PersonId] = t.c.value('(../personid)[1]', 'NVARCHAR(50)')
    ,[First Name] = t.c.value('(../firstname)[1]', 'NVARCHAR(50)')
    ,[Last Name] = t.c.value('(../lastname)[1]', 'NVARCHAR(50)')
    ,[Entertime] = t.c.value('.', 'NVARCHAR(50)')
    ,[entry_number] = ROW_NUMBER() OVER (ORDER BY t.c)
FROM @x.nodes('root/person/entertime') t(c))
, cte_leavetime as (
    SELECT
    [Leavetime] = t.c.value('.', 'NVARCHAR(50)')
    ,[entry_number] = ROW_NUMBER() OVER (ORDER BY t.c)
FROM @x.nodes('root/person/leavetime') t(c))
SELECT PersonID
    , [First Name]
    , [Last Name] 
    , [Entertime]
    , [Leavetime]
    FROM cte_entertime e 
    LEFT OUTER JOIN cte_leavetime l on e.entry_number = l.entry_number

Remus Rusanu 2009-06-12 09:49:17

I have noticed that it seems by changing the root element of the nodes functions to the sequence and then stepping up the hierarchy to get the other values that the performance of the query is drastically affected. I was attempting to validate the number of records returned by updating my query to use this format, but I cancelled the query after it had been running for 30 minutes with no results. I have the equivalent of 215,000 person records in the data set I'm working with.

Dan Rigby 2009-06-12 13:41:17

I should add though, that while this answer doesn't get me 100% where I'm trying to be, it was greatly helpful in showing me how to join over the xml data.

Dan Rigby 2009-06-12 13:42:30

glad it help. The step up to the parent element can easily be eliminated, simply use three tables joined (one for name/id, one for enters, one for leaves).

Remus Rusanu 2009-06-12 13:52:51

I've been doing some testing and that traverse to parent is causing something really terrible behind the scenes. The following query by itself seems to get slower and slower as more results are returned, and ultimately chokes around the 3,000th row (out of about 215,000):SELECT [MemberNumber] = t.c.value('(../PersonId)[1]','NVARCHAR(20)') ,[Entertime] = t.c.value('.', 'DATETIME')FROM @x.nodes('root/person/entertime') t(c)Its very interesting. I'm not sure whats going on.

Dan Rigby 2009-06-12 14:57:32

Did you try using 3 tables and join, instead of 2? Extract the ID/FirstName/LastName in one pass then @x.nodes('root/person') join with @x.nodes(/root/person/entertime) and @x.nodes(/root/person/leavetime). This should be faster as it each 'table' only scans the XML forward.

Remus Rusanu 2009-06-12 15:12:18

The problem is what do you join on. The ROW_NUMBER() solution wasn't yielding correct results because the rowcount returned from each of the 3 queries was different (id/name),(entertime),(leavetime). The only way I can see that yields correct results is to join on personid and in order to do that, the entertime and leavetime queries have to traverse upward to figure out what their personid was (see query in my answer below). Maybe theres another way?

Dan Rigby 2009-06-12 16:10:36

Answer 2

A:

I have accepted Remus's answer as it got me 95% to the solution. For informational purposes, here is the final query structure:

with cte_maindata as (
SELECT
    [PersonId] = t.c.value('(personid)[1]', 'NVARCHAR(50)')
    ,[First Name] = t.c.value('(firstname)[1]', 'NVARCHAR(50)')
    ,[Last Name] = t.c.value('(lastname)[1]', 'NVARCHAR(50)')
FROM @x.nodes('root/person') t(c))
, cte_entertime as (
    SELECT
    [PersonId] = t.c.value('(../personid)[1]', 'NVARCHAR(50)')
    ,[Entertime] = t.c.value('.', 'NVARCHAR(50)')
FROM @x.nodes('root/person/entertime') t(c))
, cte_leavetime as (
    SELECT
    [PersonId] = t.c.value('(../personid)[1]', 'NVARCHAR(50)')
    ,[Leavetime] = t.c.value('.', 'NVARCHAR(50)')
FROM @x.nodes('root/person/leavetime') t(c))
SELECT 
    m.PersonID
    ,[First Name]
    ,[Last Name] 
    ,[Entertime]
    ,[Leavetime]
FROM cte_maindata m
    LEFT OUTER JOIN cte_entertime e on m.PersonId = e.PersonId
    LEFT OUTER JOIN cte_leavetime l on m.PersonId = l.PersonId

Dan Rigby 2009-06-12 14:51:45

Answer 3

A:

Haven't realized you may have multiple persons in the document. My query would be incorrect in that case anyway. I thought maybe if you first shred out each person into its own XML fragment and ten extract the enter/leave times might perform better. I don't have 215k person XML to try, but here is an idea:

declare @x xml;
select @x = N'<root>
    <person>
        <personid>HH3269732</personid>
        <firstname>John</firstname>
        <lastname>Smith</lastname>
        <entertime>01/02/2008 10:15</entertime>
        <leavetime>01/02/2008 11:45</leavetime>
        <entertime>03/01/2008 08:00</entertime>
        <leavetime>03/01/2008 10:00</leavetime>
        <entertime>04/01/2008 08:00</entertime>
    </person>
    <person>
        <personid>HH3269733</personid>
        <firstname>Jane</firstname>
        <lastname>Doe</lastname>
        <entertime>01/03/2008 10:15</entertime>
        <leavetime>01/03/2008 11:45</leavetime>
        <entertime>03/04/2008 08:00</entertime>
        <leavetime>03/04/2008 10:00</leavetime>
        <entertime>04/04/2008 08:00</entertime>
    </person>
</root>';


with cte_person as (
    select
     t.c.value('(personid)[1]', 'NVARCHAR(50)') as personid
     , t.c.value('(firstname)[1]', 'NVARCHAR(50)') as firstname
     , t.c.value('(lastname)[1]', 'NVARCHAR(50)') as lastname
     , t.c.query('entertime') as entertime
     , t.c.query('leavetime') as leavetime
    FROM @x.nodes('root/person') t(c))
, cte_cross_enter as (
    select
     p.personid
     , p.firstname
     , p.lastname
     , x.c.value('.', 'datetime') as entertime
     , row_number() over (partition by personid order by x.c) as row_enter
     from cte_person p
     cross apply p.entertime.nodes('/entertime') x(c))
, cte_cross_leave as (
    select
     p.personid 
     , x.c.value('.', 'datetime') as leavetime
     , row_number() over (partition by personid order by x.c) as row_leave
     from cte_person p
     cross apply p.leavetime.nodes('/leavetime') x(c))
select e.personid
    , e.firstname
    , e.lastname
    , e.entertime
    , l.leavetime
    from cte_cross_enter e
    left outer join cte_cross_leave l 
      on e.personid = l.personid and 
      e.row_enter = l.row_leave

Remus Rusanu 2009-06-12 17:06:58

whoops, sry, didn't notice you already posted a solution

Remus Rusanu 2009-06-12 17:13:05

ansaurus

tags:

views:

answers:

How to write an xquery containing sequence elements?

related questions