tags:

views:

226

answers:

4

Basically I have music filenames such as:

<source> <target>

"Travis - Sing"   "Travis - Sing 2001.mp3"
"Travis - Sing"   "Travis - Sing Edit.mp3"
"Travis - Sing"   "Travis - Sing New Edit.mp3"
"Mission Impossible I"   "Mission Impossible I - Main Theme.mp3"
"Mission Impossible I"   "Mission Impossible II - Main Theme.mp3"
"Mesrine - Death Instinct"   "Mesrine - Death Instinct - Le Million.mp3"
"Mesrine - Public Enemy #1"   "Mesrine - Public Enemy #1 - Theme"
"Se7en"   "Se7en Motion Picture Soundtrack - Theme.mp3"

Parenthesis aren't included in the strings (just for demonstration).

and I am trying to match the "source" to "target" values.

So the source names I already have, but right now I am using alot of string parsing to be able to match the two. How can I achieve the same using Regex?

EDIT: It seems like there is a confusion.

"Travis - Sing" is my source string, and I am trying to match it to:

"Travis - Sing (2001).mp3"
"Travis - Sing (Edit).mp3"
"Travis - Sing (New Edit).mp3"

EDIT2: Removed the parenthesis.

+1  A: 

Are there always multiple spaces between the source and the target? If so, then the following will match:

/^(.*?)\s{2,}(.*?)$/

It basically matches two items, one before any gap of 2+ whitespace, and one after that gap. (The capture patterns use a non-greedy .*? so that if there's more than 2 whitespace, the extra whitespace won't get captured in either.)

Amber
Thanks. Some source strings don't have any spaces. I should update the post.
Joan Venge
Spaces *in* the source don't matter - my question was about whether there are always multiple spaces *between* source and target. i.e. is it always `(source)XX(target)` or are there cases where it's only a single space `(source)X(target)` where X is a space character?
Amber
Sorry. what I meant was they are separate strings, I wanna match the source to the target, So "Travis - Sing" is my source and line #1, 2, and 3 is what I want to match it to, since they are the same song.By line 1, 2 and 3 I mean (Travis - Sing (2001).mp3) ... etc
Joan Venge
+2  A: 

From your answer to my comment I'm pretty sure that you are looking for something simple like this.

So you can have multiple search terms separated with "|". This is an alternation construct.

class Program
{
    private static List<string> searchList = new List<string>
                                     {
                                         "Travis - Sing (2001).mp3",
                                         "Travis - Sing (Edit).mp3",
                                         "Mission Impossible I - Main Theme.mp3",
                                         "Mission Impossible II - Main Theme.mp3",
                                         "doesn't match"
                                     };

    static void Main(string[] args)
    {
        var matchRegex = new Regex("Travis - Sing|Mission Impossible I");
        var matchingStrings = searchList.Where(str => matchRegex.IsMatch(str));

        foreach (var str in matchingStrings)
        {
            Console.WriteLine(str);
        }
    }
}

EDIT If you want to know what you matched against, you can add groups:

    static void Main(string[] args)
    {
        var matchRegex = new Regex("(?<travis>Travis - Sing)|(?<mi>Mission Impossible I)");

        foreach (var str in searchList)
        {
            var match = matchRegex.Match(str);
            if (match.Success)
            {
                if (match.Groups["travis"].Success)
                {
                    Console.WriteLine(String.Format("{0} matches against travis", str));
                }
                else if (match.Groups["mi"].Success)
                {
                    Console.WriteLine(String.Format("{0} matches against mi", str));
                }
            }
        }
    }
Andrew Barrett
Thanks, I think this it is. I have 2 questions. Did you include parenthesis in the sourcelist, not searchlist? If so, they should be out. Sorry I thoght it would make it clear to seperate them in the question.2nd question is, does | mean a separate entry in Regex? If so then I should create a single string?Basically I want to collect the matches for each source string. So like source0 -> a, b, c | source1 -> d, e...
Joan Venge
Added some more info into my answer.
Andrew Barrett
Thanks. Is there a way to add groups to a regex without creating a very long single string?So like regex.AddGroup("travis"), ...
Joan Venge
Well, since the "very long single string" is actually just the same format repeated over and over, you could construct them as individual strings, and then use String.Join() to join them all together with the | character between.
Amber
Thanks Dav, makes sense.
Joan Venge
+3  A: 

It seems you're looking for all files that begin with a certain string - this will answer all of your examples. This can be achieved easily without regular expressions using two loops, or using linq:

var matches = from source in sources
              select new
                      {
                          Source = source,
                          Targets = from file in targets
                                    where file.StartsWith(source)
                                    select file
                      };

You can also use a regex instead of the StartsWith condition, for example:

where Regex.IsMatch(file, String.Format("^{0}", source), RegexOptions.IgnoreCase)

This can probably be optimized in many ways, but Andrew suggests writing a long pattern, which isn't quicker when done dynamically.

Kobi
+1 I wouldn't actually suggest doing it my way, I'd do it with something similar to what you're doing. My answer was more to clear up his regex queries.
Andrew Barrett
Thanks, I atually use this exactly. Just thought regex would be faster, that's why I asked. I guess I should stick to my old method then.
Joan Venge
A: 

The following method is a bit more robust (allows for different number of spaces or hypens between source and target). E.g. target may have extra spaces between words, but it will still match.

First identify the characters that are allowed as word delimiters in your string. Then split your source and target strings into tokens using your delimiters. Then check to see if the words in your source are found as the beginning words.

E.g. (Java) I have used whitespace and hyphens as delimiters

public boolean isValidMatch(String source, String target){
    String[] sourceTokens = source.split("[\\s\\-]+");  // split on sequence of 
    //whitespaces or dashes. Two dashes between words will still split 
    //same as one dash.

    String[] targetTokens = target.split("[\\s\\-]+"); // split similarly
    if(sourceTokens.length>targetTokens.length){
        return false;
    }

    for(int i=0;i<souceTokens.length;i++){
        if(!sourceTokens[i].equals(targetTokens[i])){
            return false;
        }
    }
    return true;
}

PS: You might want to add the dot '.' character as a delimiter in case you have source "Hello World" and target "Hello World.mp3"; Currently it won't match since the regex doesn't split on dot but if you expand your delimiter set to include dot, then it will.

hashable