tags:

views:

109

answers:

3

So I am trying to figure out a Regular Expression and am having some issues. What I want to find (match) is all of the SQL parameters in a large script file, but NOT match items in single quotes (such as email addresses). For example:

INSERT INTO [User]
(
   [UserGuid], [CompanyGuid], [Name], [EmailAddress]
) VALUES (
   @UserGuid1, @CompanyGuid, 'Jason', '[email protected]'
)

With @UserGuid1 and @CompanyGuid matching, but not @jason matching. I have been using this RegEx:

(@+[\w]+)

But it matches the email address, so I tried to do a negative look ahead/behind like this:

(?<!')[\W](@+[\w]+)[\W](?!')

but it is matching the '(' in the following example:

INSERT INTO [User] ([UserGuid]) VALUES (@UserGuid1)

Anyone have an idea what I am missing here? Something that can say: "anything that is NOT in a quote set?". Also, it is safe to assume balanced quote sets.

+1  A: 

(@+[^']+) should help. The [^' ,] will match anything except a single quote, space or comma. You may need to add a few more characters, but that's the general idea.

James Deville
+2  A: 

have you try the following?

(?<=\W)(@\w+)

basically it makes sure that the captured value preceded by a non-word character, you can add look-ahead too but it's kinda redundant because + is greedy and will match until non-word anyway.

the following will insure that in INSERT INTO [User] ([UserGuid]) VALUES ('@UserGuid1') nothing is matched:

(?<![\w'])(@\w+)
SilentGhost
I don't think that would work, since there is no space before the parameter in the second example text.
Ben Torell
fixed now (overenginneering begets user frustration, jeff)
SilentGhost
OK, that seems to work, though it will match items like '@something' that aren't e-mail addresses, but are quoted inputs. Adding the lookahead/behinds for quotes would help, like so:(?<=\W)(?<!')(@\w+)(?!')
Ben Torell
quote is a non-word, your look-behinds seem to be redundant
SilentGhost
No, it is necessary. What my regex is saying is that the match must be preceded by any non-word EXCEPT a single quote. This is to prevent something like ('@text') from being matched. If I understood look-behinds a bit better, I could probably condense it further, but specifying that the preceding character not be a quote is definitely necessary.
Ben Torell
so it must not be preceded by word or a quote in other words, `(?<![\w'])`. Right?
SilentGhost
Yes, that's what I was trying to say. Thank you.
Ben Torell
(?<=\W)(@\w+) fails on INSERT INTO [User] ([UserGuid]) VALUES ('@UserGuid1').
JasonRShaver
Please edit the answer to include the <code>(?<![\w'])</code> and I will mark this as answer (and delete this comment and my above one) as I *THINK* SilentGhost was first, but I upvoted both.
JasonRShaver
Yes, he was first. :-)
Ben Torell
+1  A: 

Try this:

(?<=[^\w'])(@\w+)(?!')

This specifies that each match must be preceded by a non-word character (except for single quotes), then have an @ sign and a word, and not followed by another single quote.

Ben Torell
yeah, that's exactly what my comment said
SilentGhost
I'm not about to get into an internet argument, but if you test it, you'll see that I'm right.I'm using this tester:http://www.gskinner.com/RegExr/
Ben Torell
I'm not saying you're wrong, I'm saying that effectively the same thing was posted in the comment before your answer. Additionally, OP writes that *it is safe to assume balanced quote sets*, so the look-ahead check is again redundant.
SilentGhost
Oh, gotcha. I didn't see your last comment because it was the sixth one and was hidden, and I went ahead and looked it up. No worries :-)Good point, though, on the look-ahead.
Ben Torell
This one worked as well.
JasonRShaver