views:

792

answers:

5

Language: C#

Hello, I'm writing an application that uses renaming rules to rename a list of files based on information given by the user. The files may be inconsistently named to begin with, or the filenames may be consistent. The user selects a list of files, and inputs information about the files (for MP3s, they would be Artist, Title, Album, etc). Using a rename rule (example below), the program uses the user-inputted information to rename the files accordingly.

However, if all or some the files are named consistently, I would like to allow the program to 'guess' the file information. That is the problem I'm having. What is the best way to do this?

Sample information

Sample filenames

Kraftwerk-Kraftwerk-01-RuckZuck.mp3
Kraftwerk-Autobahn-01-Autobahn.mp3
Kraftwerk-Computer World-03-Numbers.mp3

Rename Rule

%Artist%-%Album%-%Track%-%Title%.mp3

The program should properly deduce the Artist, Track number, Title, and Album name.

Again, what's the best way to do this? I was thinking regular expressions, but I'm a bit confused.

+1  A: 

Not the answer to the question you asked, but an ID3 tag reading library might be a better way to do this when you are using MP3s. A quick Google came up with: C# ID3 Library.

As for guessing which string positions hold the artist, album, and song title... the first thing I can think of is that if you have a good selection to work with, say several albums, you could first see which position repeats the most, which would be the artist, which repeats the second most (album) and which repeats the least (song title).

Otherwise, it seems like a difficult guess to make based solely on a few strings in the file name... could you ask the user to also input a matching expression for the file name that describes the order of the fields?

SoloBold
The user will input a matching expression, complete with placeholders. I guess I didn't clarify. The program should use the rename rule in conjunction with filenames to guess the information.This program should work on any form of data, not just mp3 files.
Mike Christiansen
A: 

The filenames in your example seem pretty consistent to me. You can simply do string.Split() and add each element of the resulting array to its according tag information.

Guessing at which position is which tag information would involve TONS of heuristics.

Btw. folders that contain song files usually have some pattern in their name as well, f.e.

1998 - Seven

1999 - Periscope

2000 - CO2

The format here is %Year% - %AlbumName%, that might help you to identify which element in the filename is the album.

arul
A: 

To clarify, I DO have a pattern to match the filenames against.

I don't know the filename or pattern ahead of time, it is all run-time.

Pattern:

%Artist%-%Album%-%Track%-%Title%.mp3
Filenames:
Kraftwerk-Kraftwerk-01-RuckZuck.mp3
Kraftwerk-Autobahn-01-Autobahn.mp3
Kraftwerk-Computer World-03-Numbers.mp3
Expected Result:
Artist    Album          Track Title
Kraftwerk Kraftwerk      01    RuckZuck
Kraftwerk Autobahn       01    Autobahn
Kraftwerk Computer World 01    Numbers

Again, the format, and filenames are not always the same.

Mike Christiansen
You should add this to you original question instead.
Cros
+3  A: 

Easiest would be to replace each %Label% with (?<Label>.*?), and escape any other characters.

%Artist%-%Album%-%Track%-%Title%.mp3

becomes

(?<Artist>.*?)-(?<Album>.*?)-(?<Track>.*?)-(?<Title>.*?)\.mp3

You would then get each component into named capture groups.

Dictinary<string,string> match_filename(string rule, string filename) {
    Regex tag_re = new Regex(@'%(\w+)%');
    string pattern = tag_re.Replace(Regex.escape(rule), @'(?<$1>.*?)');
    Regex filename_re = new Regex(pattern);
    Match match = filename_re.Match(filename);

    Dictionary<string,string> tokens =
            new Dictionary<string,string>();
    for (int counter = 1; counter < match.Groups.Count; counter++)
    {
        string group_name = filename_re.GroupNameFromNumber(counter);
        tokens.Add(group_name, m.Groups[counter].Value);
    }
    return tokens;
}

But if the user leaves out the delimiters, or if the delimiters could be contained within the fields, you could get some strange results. The pattern would for %Artist%%Album% would become (?<Artist>.*?)(?<Album>.*?) which is equivalent to .*?.*?. The pattern wouldn't know where to split.

This could be solved if you know the format of certain fields, such as the track-number. If you translate %Track% to (?<Track>\d+) instead, the pattern would know that any digits in the filename must be the Track.

MizardX
What do you mean by the delimiters part?
Mike Christiansen
The dash between `%Artist%` and `%Album%`. The pattern wouldn't know where to split if the rule was `%Artist%%Almbum%`
MizardX
What if I wanted the possibility of no delimiter? Such as:Kraftwerk01Autobahn.mp3 (being Artist, Track, Title)
Mike Christiansen
You could use special patterns for certain fields, but I don't know all of the fields.
MizardX
There are 4-5 fields, user configurable. Defaults would be %Artist% %Album% %Track% %Title% %Genre%
Mike Christiansen
If all fields are entered by the user, you wouldn't know which ones that are numeric.
MizardX
All fields can be considered string type. The program is supposed to work for more than just MP3, it should work for ANY type of file, and meet those needs.Is it possible to do this:(?<Field1>.*?)(?<Field2>.*?)(?<Field3>.*?)(?<Field4>.*?)\..*
Mike Christiansen
%Artist%-%Album%-%Track%-%Title%.%Extension%
MizardX
Maybe I'll add a way to put length in the fields... Like %Artist8%%Track%Then you can assume that the artist takes 8 places, the track takes the rest.
Mike Christiansen
A: 

I have written a command-line file renamer --- RenameWand --- that does the kind of pattern matching you are describing. It's in Java though, but I think some of the source code and usage documentation may be of interest to you. A simple example of what the program can do:

Source Pattern (user-specified):

<artist>-<album>-<track>-<title>.mp3

Target Pattern (user-specified):

<title.upper>-<3|track+10>-<album.lower>-<artist>.mp3

Original Filename:

Kraftwerk-Computer World-03-Numbers.mp3

Renamed Filename:

NUMBERS-013-computer world-Kraftwerk.mp3
Zach Scrivena