ansaurus

Question

How can I write a regex to match a torrents title format?

Answer 1

+1 A:

Almost every media file I've ever seen that has come from a torrent had two-digit episodes. With that, you should be able to do E([0-9]{2}). instead and get the expression to match.

I'd estimate 99.9% of shows are marked with two digit episodes. If you're trying to write a script to easily label your own shows, I'd go with the two digit episode assumption and manually rename mistagged files you come across. If you're trying to write something for public consumption, you probably have a lot more syntaxes that you'll need to consider. I've seen this tried by other applications in the past, and all have worked just so-so. It's a hard problem that probably has no single solution.

Dave McClelland 2010-09-27 23:52:26

@Dave McClelland, your regex sample that you posted is addressing the portion that I have no problem with. When the letters 'S' and 'E' are present, I have no troubles. I'm looking for help with the format when they aren't there.

KingNestor 2010-09-28 00:00:44

@King - Sorry - I had been editing my post to more accurately address your concerns, probably when you were already commenting. Does my update help any further?

Dave McClelland 2010-09-28 02:25:07

Answer 2

+4 A:

Here's what I would use:

(.*?)\.S?(\d{1,2})E?(\d{2})\.(.*)

Has capture groups:

1: Name
2: Season
3: Episode
4: The Rest

Here's some code in C# (courtesy of this post): see it live

using System;
using System.Text.RegularExpressions;

public class Test
{

    public static void Main()
    {
        string s = @"MyTV.Show.S09E01.HDTV.XviD
            MyTV.Show.S10E02.HDTV.XviD
            MyTV.Show.901.HDTV.XviD
            MyTV.Show.1102.HDTV.XviD";

        Extract(s);

    }

    private static readonly Regex rx = new Regex
        (@"(.*?)\.S?(\d{1,2})E?(\d{2})\.(.*)", RegexOptions.IgnoreCase);

    static void Extract(string text)
    {
        MatchCollection matches = rx.Matches(text);

        foreach (Match match in matches)
        {
            Console.WriteLine("Name: {0}, Season: {1}, Ep: {2}, Stuff: {3}\n",
                match.Groups[1].ToString().Trim(), match.Groups[2], 
                match.Groups[3], match.Groups[4].ToString().Trim());
        }
    }

}

Produces:

Name: MyTV.Show, Season: 09, Ep: 01, Stuff: HDTV.XviD
Name: MyTV.Show, Season: 10, Ep: 02, Stuff: HDTV.XviD
Name: MyTV.Show, Season: 9, Ep: 01, Stuff: HDTV.XviD
Name: MyTV.Show, Season: 11, Ep: 02, Stuff: HDTV.XviD

NullUserException 2010-09-28 00:05:22

Interesting, I would of thought that (\d{1,2}) would of greedily tried to match 2 digits, since technically 2 were available.

KingNestor 2010-09-28 00:08:41

@KingNestor It won't because then it would fail to match the `\d{2}` that comes after it.

NullUserException 2010-09-28 00:10:02

So, in its process of matching, does it first attempt to match 2 and then backtrack later to try matching 1?

KingNestor 2010-09-28 00:10:53

ansaurus

tags:

views:

answers:

How can I write a regex to match a torrents title format?

related questions