tags:

views:

877

answers:

4

To see what file to invoke the unrar command on, one needs to determine which file is the first in the file set.

Here are some sample file names, of which - naturally - only the first group should be matched:

yes.rar
yes.part1.rar
yes.part01.rar
yes.part001.rar

no.part2.rar
no.part02.rar
no.part002.rar
no.part011.rar

One (limited) way to do it with PCRE compatible regexps is this:

.*(?:(?<!part\d\d\d|part\d\d|\d)\.rar|\.part0*1\.rar)

This did not work in Ruby when I tested it at Rejax however.

How would you write one Ruby compatible regular expression to match only the first file in a set of RAR files?

A: 

I am no regex expert but here is my attempt

^(yes|no)\.(rar|part0*1\.rar)$

Replace "yes|no" with the actual file name. I matched it against your examples to see if it would only match the first set hence the "yes|no" in the regex.

UPDATE: fixed as per the comment. Not sure why the user would not know the filename so i did not fix that part...

Matthew Encinas
This also accepts "no.part21.rar"; you probably want "0" instead of "[^1]". Also, I doubt the filename is known beforehand.
mweerden
Unfortunately there is no way to know what the users might name their files. Still, it's possible to catch some more file names with your regexp by changing it to ^\D+\.(rar|part0*1\.rar)$but if the user does have numbers in the file name it's back to square one again.
Micke
A: 

Personally I wouldn't use (extended) regular expressions in this case (or at least not just one to do it all). What's wrong with coding this in, for example, a few ifs?

mweerden
Nothing wrong with that, and that's exactly how I solved it right before I asked the question. But you know, once you attemt something and can't figure it out, you really want to know how it's supposed to be done.
Micke
+3  A: 

The short answer is that it's not possible to construct a single regex to satisfy your problem. Ruby 1.8 does not have lookaround assertions (the (?<! stuff in your example regex) which is why your regex doesn't work. This leaves you with two options.

1) Use more than one regex to do it.

def is_first_rar(filename)
    if ((filename =~ /part(\d+)\.rar$/) == nil)
        return (filename =~ /\.rar$/) != nil
    else
        return $1.to_i == 1
    end
end

2) Use the regex engine for ruby 1.9, Oniguruma. It supports lookaround assertions, and you can install it as a gem for ruby 1.8. After that, you can do something like this:

def is_first_rar(filename)
    reg = Oniguruma::ORegexp.new('.*(?:(?<!part\d\d\d|part\d\d|\d)\.rar|\.part0*1\.rar)')
    match = reg.match(filename)
    return match != nil
end
bmdhacks
+3  A: 

Don't rely on the names of the files to determine which one is first. You're going to end up finding an edge case where you get the wrong file.

RAR's headers will tell you which file is the first on in the volume, assuming they were created in a somewhat-recent version of RAR.

HEAD_FLAGS Bit flags:
2 bytes

0x0100 - First volume (set only by RAR 3.0 and later)

So open up each file and examine the RAR headers, looking specifically for the flag that indicates which file is the first volume. This will never fail, as long as the archive isn't corrupt. I have done my own tests with spanning RAR archives and their headers are correct according to the link above.

This is a much, much safer way of determining which file is first in a set like this.

Welbog