views:

89

answers:

4

Suppose you have files like:

NewFile.part01.zip
NewFile.part02.zip
NewFile.part04.zip
NewFile.part06.zip
NewFile.part07.zip

How do you get the files in this patter so you only get a SINGLE file called "NewFile" and also get the missing ones as integers, in this case (3, 5)

Right now I am checking files one by one and if the name only differs in the suffix then skipping, also checking the number is +1 than the previous, etc.

But I thought someone might have a better, more elegant way of doing this. Linq, regex, etc?

EDIT:

So the way to know when the continuous files end is when the last file size has a difference than others. So it's like 200mb, 200mb, 200mb, ..., then the last one is 196mb.

My input is the full file list with the path like:

"C:\NewFile.part01.zip"
"C:\NewFile.part02.zip"
...
+1  A: 

If you know the file names try something like this (LINQ's "Except"):

string[] seq1 = { "NewFile.part01.zip", "NewFile.part03.zip"};
string[] seq2 = { "NewFile.part01.zip", "NewFile.part02.zip", "NewFile.part03.zip" };
var diffs = seq2.Except(seq1);

PK :-)

(I just saw your edit but now not so clear on the question)

Paul Kohler
Do you construct the seq2 var by using 1 to n, and seq1 is the list of the files that I have?
Joan Venge
What are you not clear on? I will add the detail you need.
Joan Venge
The original question seemed to be around getting the difference of 2 sets (missing files) but the edit implied checking file sizes to find the last.
Paul Kohler
I see, yeah ignore that one. I added it for Mark. But I also need to get the base file name which is NewFile, thanks.
Joan Venge
+2  A: 

Okay, first of all, you can extract a number from filename:

int ExtractNumber(string filename)
{
    filename = filename.Remove(filename.LastIndexOf('.'));
    filename = filename.Remove(0, filename.LastIndexOf('.') + 1);
    filename = filename.Remove(0, 4); // "part"
    return int.Parse(filename);
}

Now, you can check the missing numbers.

HashSet<int> existingNumbers = new HashSet<int>();
int max = -1;
foreach (string fn in filenameList)
{
    int n = ExtractNumber(fn);
    existingNumbers.Add(n);
    max = Math.Max(max, n);
}
HashSet<int> nonExistingNumbers = new HashSet<int>();
for (int i = 0; i <= n; i++)
    if (!existingNumbers.Contains(i))
        nonExistingNumbers.Add(i);
Vlad
Thanks, would extract method cause a problem if the numbers are part012, 013, 111, etc? For files that are more than 100, there are 3 digits.
Joan Venge
Yes, it had 2 digits hardcoded. I've changed it to support any number of digits.
Vlad
A: 

In dos, simply type:

copy "c:\NewFile.Part??.zip" "c:\NewFile.zip" /b

Don't forget the /b or it will process command codes differently and turn any '0x0d' or '0x0a' characters into the char pair '0x0d0a'

Grant Peters
Oh, I missed the part about informing about the missing files, though it will display a list of all the files that got merged and the order in which they got merged. Also, this solution probably won't work if the final size is > 4GB, but I'll leave it here as it is a nice quick solution to this kind of problem.
Grant Peters
Thanks, also this actually doesn't copy files, right?
Joan Venge
It will copy all the files into 1 file, basically merging them but leaving the original files in place. Was great back in the dos days when you recieved a several MB file that was split so it could fit on multiple floppies.
Grant Peters
Thanks that's good to know, I didn't know that. Would it be compatible with the zip files as if they were 1 file?
Joan Venge
Zip files are what I was usually merging together back in the day, so if the file format hasn't changed too much, it should still work fine
Grant Peters
+2  A: 

You can use regex that looks like this

^(?<name>.*)\.part(?<num>\d{0,})\.zip$

which should give you two group matches, one for the filename and one for the num

Do a loop, collect the info and then you can identify the name and numbers (store in a list). If you like you can use linq in loop like this to identify the missing number set

foreach(int i = list.Min(); i <= list.Max(); i++)
{
  if (!list.Contains(i))
    missingNums.Add(i);
}

--- Edited to give example as requested

This is the example how you will use the regex to iterate through your file list

   string pattern = @"^(?<name>.*)\.part(?<num>\d{0,})\.zip$";
    foreach(string file in files)
    {
        Match match = Regex.Match(file, pattern);
        if (match.Success && match.Groups.Count >= 2)
        {
            string filename = match.Groups["name"].Value;
            int num = Convert.ToInt32(match.Groups["num"].Value);
        }
    }
Fadrian Sudaman
Thanks, can you please give me an example with your regex say using filelist? I am not very experienced in regex.
Joan Venge
I have edited the answer to show how to use the regex with filelist
Fadrian Sudaman
This returns wrong for double digit numbers, returns only a single digit for the number group n the regex match.
Joan Venge
Sorry my mistakes, I had an extra '.' also forgot to escape the actual dot character. I have fixed it up in the answer above.
Fadrian Sudaman
Your new pattern seems to work fine for all numbers. Thanks.
Joan Venge