tags:

views:

445

answers:

4

I have the following line of text

Reference=*\G{7B35DDAC-FFE2-4435-8A15-CF5C70F23459}#1.0#0#..\..\..\bin\App Components\AcmeFormEngine.dll#ACME Form Engine

and wish to grab the following as two separate capture groups:

AcmeFormEngine.dll
ACME Form Engine

Can anyone help?

+1  A: 
    using System.Text.RegularExpressions;

    Regex regex = new Regex(
    @"\\(?<filename>[\w\.]+)\#(?<comment>[\w ]+)$",
    RegexOptions.IgnoreCase
    | RegexOptions.Compiled
    );
Bartek Szabat
Does the hash really need escaping? What special meaning does it have?
Matthew Scharley
Hash is begin of comment
Bartek Szabat
# seems to stand for comment :) I think
Gishu
Silly .NET regexes. Fixed mine now.
Matthew Scharley
this is broken if you have - or _ in the filename
Gishu
+1  A: 
Regex r = new Regex("\\(.+?)\#(.+?)$");

Non-greedy multiplicities are great.

'$': Match the end of the string.

"\#(.+?)": Match everything back from the end of the string till the first '#' character and return that in a capture.

"\\(.+?)": Same again, except with an escaped '\'.

Matthew Scharley
this doesn't work. '\.' is a valid match
Gishu
Should be fixed now. silly # comments.
Matthew Scharley
upvote because it's the shorted expression and you explained how/why it works
Joel Coehoorn
+5  A: 

If you are sincere of the string format, you can also solve that in an earthbound manner, without regex: Take everything after the last index of '\', and split that at '#'.

Tomalak
agree. More readable over a regex in this specific scenario.
Gishu
And more efficient since we only need to do character comparison and avoid the overhead of the state machine.
tvanfosson
+1  A: 

I voted for tomalask's non-regex approach. However if you HAD to do it with regex, I think you need something like this

\\([^\\/?"<>|]+?)\#([^\\/?"<>|]+?)[\r\n]*$

This will allow things like - and _ which are valid in filenames, Its 2 identical groups (each excluding invalid chars for win32 filenames) beginning with a slash, delimited by a # and at the end of the line (the $). Assuming second group is also a valid win32 filename.. I saw some ugly boxes in the matched second group, the [\r\n]* keeps them away.

e.g. F5C70F23459}#1.0#0#..\..\..\bin\App Components\Acme_Form-Engine.dll#ACME Form Engine
group#1 => Acme_Form-Engine.dll
group#2 => ACME Form Engine

In short this is arcane.. avoid if possible.

Gishu