views:

362

answers:

4

I have the following search phrase and I need to extract

  1. ABC XYZ
  2. Mobile Accessories
  3. Samsung 250

whenever they occur in the string in any order. The application is C# .Net.

Search Phrase
__________________________________________________________
ABC XYZ
ABC XYZ category:"Mobile Accessories"
category:"Mobile Accessories" ABC XYZ
ABC XYZ Model:"Samsung 250"
Model:"Samsung 250" ABC XYZ
ABC XYZ category:"Mobile Accessories" Model:"Samsung 250"
Model:"Samsung 250" category:"Mobile Accessories" ABC XYZ
category:"Mobile Accessories" Model:"Samsung 250" ABC XYZ
__________________________________________________________

Thanks in advance.

Example 1 Input - ABC XYZ category:"Mobile Accessories" Output - ABC XYZ and Mobile Accessories

Example 2 Input - Model:"Samsung 250" category:"Mobile Accessories" ABC XYZ Output - Samsung 250, Mobile Accessories and ABC XYZ

Example 3 Input - ABC XYZ Output - ABC XYZ

Example 4 Input - Model:"Samsung 250" ABC XYZ Output - Samsung 250 and ABC XYZ

A: 

Check out http://www.txt2re.com/

Bård
+1  A: 

If you're literally trying to find explicit strings, the IndexOf method will work for you (EG: s.IndexOf("ABC XYZ")).

The syntax you show looks kind of like a field:"value" syntax though, so perhaps you want a regex like "([a-z]+):\"([^"]+)\"" (Which should match out field and value in pairs).

If that's not what you're after sorry, but the question is a bit vague.

Tim Schneider
Nice one. Good point with IndexOf. I assumed the OP wanted to extract the specific keys as in the example, but your idea is just as valid. +1.
Kobi
+1  A: 

As for Model and Category, you can capture them using something like that:

category:"([^"]*)"

This searches for the string category:" followed by a your category (which assumbly can change, followed by another ". Of course, in c# this should be escaped: @"category:""([^""]*)""".
Similarity, you can extract the Model: Model:"([^"]*)".

Not sure about the rest, but if you remove these two, you are left with the free string.

Kobi
Can you let me know how I can exclude these two so that I can left with the remaining free string.
SednaSystems
You can use `Regex.Replace` to remove the matching strings, or all `key:"value"` pairs as @fyjham showed. That leaves you with three calls for three values, which isn't so bad.
Kobi
You also have the option of iterating over the Match.Groups that are returned from the regex match and using the Index and Length attributes of each match in combination with SubString to pluck out the unmatched contents. Whether you'd bother with this would depend on how performance-intensive your regex is (This would give better performance than more regex calls if you're expecting this to be called very frequently, but requires a little more code).
Tim Schneider
+1  A: 

It seems like you want to extract a few different patterns from the same string, one approach would be to find each match and then remove it from your working string.

Psudo-Example: (Note, I don't use C# so the specifics may be incorrect)

String workingstring = "ABC XYZ category:\"Mobile Accessories\"";

Regex categoryMatch("category:\"([^\"]+)\"");
Regex modelMatch("model:\"([^\"]+)\"");

String category = categoryMatch.Match(workingstring);
String model = modelMatch.Match(workingstring);

workingstring = Regex.Replace(workingstring, categoryMatch, "");
workingstring = Regex.Replace(workingstring, modelMatch, "");

String name = workingstring; //I assume that the extra data is the name

This might be what you're looking for, it should extract the Category, Model and Name(?) regardless of the format of the string. You should note that malformed strings such as

ABC Model:"Samsung 250" XYZ

Will return:

ABC  XYZ
VoDurden