tags:

views:

61

answers:

3

I'm developing a simple little search mechanism and I want to allow the user to search for chunks of text with spaces. For example, a user can search for the name of a person:

Name: John Smith

I then "John Smith".Split(' ') into an array of two elements, {"John","Smith"}. I then return all of the records that match "John" AND "Smith" first followed by records that match either "John" OR "Smith." I then return no records for no matches. This isn't a complicated scenario and I have this part working.

I'd now like to be able to allow the user to ONLY return records that match "John Smith"

I'd like to use a basic quote syntax for searching. So if a user wants to search for "John Smith" OR Pocahontas they would enter: "John Smith" Pocahontas. The order of terms is absolutely irrelevant; "John Smith" does not receive priority over Pocahontas because he comes first in the list.

I have two main trains of thought on how I should parse the input.

A) Using regular expression then parsing stuff (IndexOf, Split)
B) Using only the parsing methods 

I think a logical point of action would be to find the stuff in quotes; then remove it from the original string and insert it into a separate list. Then all the stuff left over from the original string could be split on the space and inserted into that separate list. If there is either 1 quote or an odd number, it is simply removed from the list.

How do I find matches the from within regex? I know about regex.Replace, but how would I iterate through the matches and insert them into a list. I know there is some neat way to do this using the MatchEvaluator delegate and linq, but I know basically nothing about regex in c#.

A: 

look to this answer: http://stackoverflow.com/questions/554013/regular-expression-to-split-on-spaces-unless-in-quotes

exactly that you need.

Michael Pakhantsov
You are correct, lucero got to it first though. Thanks though.
Shawn
A: 

Use a regex like this:

string input = "\"John Smith\" Pocahontas";
Regex rx = new Regex(@"(?<="")[^""]+(?="")|[^\s""]\S*");
for (Match match = rx.Match(input); match.Success; match = match.NextMatch()) {
    // use match.Value here, it contains the string to be searched
}
Lucero
A: 

EDIT: Came back to this tab withou refreshing and didn't realize this question was already answered... accepted answer is better.


I think pulling out the stuff in quotes first with regex is a good idea. Maybe something like this:

String sampleInput = "\"John Smith\" Pocahontas Bambi \"Jane Doe\" Aladin";

//Create regex pattern
Regex regex = new Regex("\"([^\".]+)\"");

List<string> searches = new List<string>();

//Loop through all matches from regex
foreach (Match match in regex.Matches(sampleInput))
{
    //add the match value for the 2nd group to the list
    //(1st group is the entire match)
    //(2nd group is the first parenthesis group in the defined regex pattern
    //   which in this case is the text inside the quotes)
    searches.Add(match.Groups[1].Value);
}

//remove the matches from the input
sampleInput = regex.Replace(sampleInput, String.Empty);

//split the remaining input and add the result to our searches list
searches.AddRange(sampleInput.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries));
Chris