tags:

views:

85

answers:

3
+2  Q: 

Regex problem ...

hello i have the next regex

Dim origen As String = "  /c /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt"" /cid:45    423 /z:65 /a:23  /m:39 /t:45rt "

Dim str As String = "(^|\s)/p:""\w:(\\(\w+[\s]*\w+)+)+\\\w+.\w+""(\s|$)"
Dim ar As Integer

Dim getfile As New Regex(str)
Dim mgetfile As MatchCollection = getfile.Matches(origen)
ar = mgetfile.Count

When i evaluate this it works, and gets the /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt"" that basically is the path to a file.

But if I change the origen string to

Dim origen As String = "  /c /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt""/cid:45    423 /z:65 /a:23  /m:39 /t:45rt "

Check that the end of the file is follow by "/cid:45" whitchs makes de patter invalid, but insted of getting a mgetfile.count = 0 the program is block, if i make a debug I got a property evaluation failed.

Thanks for your comments !!!

A: 

Do you always know that there are two double quotes at the beginning and end? If so just do:

(^|\s)/p:""(.*?)""(.*$)
John
+2  A: 

Can you clean up the whole expression to just:

str = "/p:"".*?"""
Chris Haas
Wow thanks a lot that solve the problem, i just add (^|\s) at the begginning and (\s+|$) at the end to ensure it wont include a valid pattern when there is something before o after the rigth string.The final patter is (^|\s+)/p:"".*?""(\s+|$)Thanks to all for your comments !!!What i learned is "keep it simple!!!!!! "
carlos
Yep, especially with RegEx, the simpler the better.
Chris Haas
+1  A: 

The reason why your program hangs is catastrophic backtracking.

The parts of your regex (\w+\s*\w+)+ and \w+.\w+ allow so many permutations that the regex engine gets stuck in a near-infinite loop. RegexBuddy's debugger quits after 1000000 steps.

This only happens if the pattern can't match successfully, thereby prompting the regex engine to try any and all other permutation the pattern allows. Generally, repeating groups that contain repeating quantifiers is dangerous.

What are the real requirements? To match a path that only contains letters, numbers, underscores and backslashes? Or just a string between quotes? Perhaps you could shed some light on this...

Until then, I suggest the following:

"(?<=^|\s)/p:""\w:(\\[\w\s]++)+\.\w+""(?=\s|$)"

This cleans up a few things: (\\[\w\s]++) matches a backslash, followed by any number of alphanumeric and space characters. Once they have been matched, the regex engine refuses to try a different permutation (this is achieved by using the possive quantifier ++ instead of just a +.

After that, it matches a dot (your version would have matched any character), and a sequence of alphanumeric characters. Then a quote, and then it checks if a space or end-of-string follow. If not, the regex will fail, and fail quickly.

If you only want to match a string between quotes, then

"(?<=^|\s)/p:""[^""]+""(?=\s|$)"

is the best and fastest way.

Tim Pietzcker
Thanks a lot that was what was happening .. very good guidance to solve the problem ...
carlos