views:

1857

answers:

3

I have a varchar column in a table that is used to store xml data. Yeah I know there is an xml data type that I should be using, but I think this was set up before the xml data type was available so a varchar is what I have to use for now. :)

The data stored looks similar to the following:

<xml filename="100100_456_484351864768.zip"  
     event_dt="10/5/2009 11:42:52 AM">
    <info user="TestUser" />
</xml>

I need to parse the filename to get the digits between the two underscores which in this case would be "456". The first part of the file name "shouldn't" change in length, but the middle number will. I need a solution that would work if the first part does change in length (you know it will change because "shouldn't change" always seems to mean it will change).

For what I have for now, I'm using XQuery to pull out the filename because I figured this is probably the better than straight string manipulation. I cast the string to xml to do this, but I'm not an XQuery expert so of course I'm running into issues. I found a function for XQuery (substring-before), but was unable to get it to work (I'm not even sure that function will work with SQL Server). There might be an XQuery function to do this easily, but if there is I am unaware of it.

So, I get the filename from the table with a query similar to the following:

select CAST(parms as xml).query('data(/xml/@filename)') as p
from Table1

From this I'd assume that I'd be able to CAST this back to a string then do some instring or charindex function to figure out where the underscores are so that I can encapsulate all of that in a substring function to pick out the part I need. Without going too far into this I am pretty sure that I can eventually get it done this way, but I know that there has to be an easier way. This way would make a huge unreadable field in the SQL Statement which even if I moved it to a function would still be confusing to try to figure out what is going on.

I'm sure there is an easier than this since it seems to be simple string manipulation. Perhaps someone can point me in the right direction. Thanks

+1  A: 

Unfortunately, SQL Server is not a conformant XQuery implementation - rather, it's a fairly limited subset of a draft version of XQuery spec. Not only it doesn't have fn:substring-before, it also doesn't have fn:index-of to do it yourself using fn:substring, nor fn:string-to-codepoints. So, as far as I can tell, you're stuck with SQL here.

Pavel Minaev
+1 Thanks, I was afraid that SQL Server had a limited subset of XQuery. Looks like I'll have to use the substring function in SQL Server to do it like I was thinking and like Steve Kass answered.
Dusty
+2  A: 

The straightforward way to do this is with SUBSTRING and CHARINDEX. Assuming (wise or not) that the first part of the filename doesn't change length, but that you still want to use XQuery to locate the filename, here's a short repro that does what you want:

declare @t table (
  parms varchar(max)
);
insert into @t values ('<xml filename="100100_456_484351864768.zip" event_dt="10/5/2009 11:42:52 AM"><info user="TestUser" /></xml>');

with T(fName) as (
  select cast(cast(parms as xml).query('data(/xml/@filename)') as varchar(100)) as p
  from @t
)
  select
    substring(fName,8,charindex('_',fName,8)-8) as myNum
  from T;

There are sneaky solutions that use other string functions like REPLACE and PARSENAME or REVERSE, but none is likely to be more efficient or readable. One possibility to consider is writing a CLR routine that brings regular expression handling into SQL.

By the way, if your xml is always this simple, there's no particular reason I can see to use XQuery at all. Here are two queries that will extract the number you want. The second is safer if you don't have control over extra white space in your xml string or over the possibility that the first part of the file name will change length:

  select
    substring(parms,23,charindex('_',parms,23)-23) as myNum
  from @t;

  select
    substring(parms,charindex('_',parms)+1,charindex('_',parms,charindex('_',parms)+1)-charindex('_',parms)-1) as myNum
  from @t;
Steve Kass
+1 It looks like I'm going to have to do what I was thinking I would have to which is use the SQL Server substring to parse it out. I appreciate your response and doing most of the work for me. I think I will make a function that does something similar to your first post, but in this situation the second code sample you posted would work, but I'd rather use XQuery to pluck out the filename before doing the string manipulation. Thanks again for you're help and I'll mark this as the answer.
Dusty
+1  A: 

You can use XQuery for this - just change your statement to:

SELECT
   CAST(parms as xml).value('(/xml/@filename)[1]', 'varchar(260)') as p
FROM 
   dbo.Table1

That gives you a VARCHAR(260) long enough to hold any valid file name and path - now you have a string and can work on it with SUBSTRING etc.

Marc

marc_s
I appreciate you response, but I was able to get this with the query in my post using .query instead of .value. I was looking for the best way to parse out the filename once I got it. However, now that we are on the subject, is the preferred method to use .query or .value?
Dusty
`query()` returns an entire XDM result tree as instance of `XML` data type; `value()` requires your query to return a single XDM value only, and converts it to some SQL type. So in general you go for the former when you actually need to return an XML document or fragment, or at least a nodeset, and for the latter when you need to return just a single value.
Pavel Minaev
Thanks. That makes sense. Although it doesn't give you any points, I upvoted your comment. :)
Dusty
+1 Because I am supposed to be using .value instead of .query
Dusty