views:

716

answers:

4

I'm building an application that uses an elaborate API to fetch data from Youtube, and the data files are saved with the name of the corresponding video as the file name. However, my program is crashing because quite a lot of the videos on YouTube have characters in their titles that are illegal to use in file names under Windows.

Would URLEncoding the title of the video fix this problem?

If so, is this the best method to use, and what would be the best way to implement a URLEncode?

Thanks! :)

+4  A: 

Well if you want to do url encoding, you could use HttpUtility.UrlEncode. I'm not sure I would though. It may strip out all the characters you want it to, but it'll do others as well.

I think I'd probably use Path.GetInvalidFilenameChars and just replace anything invalid in the name with an underscore.

That's not a reversible encoding, of course, but I think it'll produce filenames which are easier to understand. You might want to create an index file which maps from original title to filename as well.

Jon Skeet
Thanks! I didn't know something like that existed! I will try that and report back whether it works or not. :)
Maxim Zaslavsky
+1  A: 

Url Encoding should fix the problem, as it should replace any invalid char (and a few valid ones) with a '%' followed by a set of hex; to my knowledge that is valid for file system names.

This begs two questions though:

  1. Is being able to cleanly read a filename important for the user? If not, it might be better to use a unique file name (1.file, 2.file, 3.file) and a mapping from file name -> title

  2. What happens if two videos have the same name? Sort of an extension of the first question, I think.

  3. What if the title (when url encoded) is longer then the max filename length? If I recall correctly, max length for a filename is 255 characters on NTFS; if each char in a title expands to 3 chars for url encoding, then the 255 char limit could be met with an 85 char title.

EDIT/Update: There are some characters that UrlEncode considers valid which are invalid file system chars; the one I've specifically come across is '\'. So, no, Url Encoding would not be safe.

CoderTao
Yes I was thinking of whether to just use unique file names, but that second question is actually quite important - i need to remember to add some implementation to make sure that file names aren't the same. Thanks!
Maxim Zaslavsky
A: 

Instead of the video name can you use youtube's video id? e.g. v=Yk6oPsKZG_w. Or do you not have access to that? Those seem to contain simple alphanumerics and should be unique within youtube.

I'm not sure if urlencode will help you with asterisks in the video name.

If you still want to use the video name you may want to look at using the "\\?\" prefix which tells the Win32 APIs to disable all string parsing and to send this string straight to the file system.

http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx#path_names_and_namespaces

I'm not sure, if you can use that with the .NET API or if you would have to use DllImport to invoke the Win32 API directly.

Tuzo
A: 

I ended up doing this with a similar problem:

    static string Escape(string input)
    {
        StringBuilder builder = new StringBuilder(input.Length);
        for (int i = 0; i < input.Length; i++)
        {
            if (Path.GetInvalidPathChars().Contains(input[i]) || Path.GetInvalidFileNameChars().Contains(input[i]) || input[i] == '%')
            {
                builder.Append(Uri.HexEscape(input[i]));
            }
            else
            {
                builder.Append(input[i]);
            }
        }
        return builder.ToString();
    }

    static string Unescape(string input)
    {
        StringBuilder builder = new StringBuilder(input.Length);
        int index = 0;
        while (index < input.Length)
        {
            builder.Append(Uri.HexUnescape(input, ref index));
        }
        return builder.ToString();
    }

It felt a bit weird to have to write all this code, but at least I get readable file names that are safe to use with the OS.

Hallgrim