views:

203

answers:

2

Hi, i'm currently using File::Basename fileparse to separate out a file's directory, base file name and it's extension using something like this:

my($myfile_name,$mydirectory, $file_extension) = fileparse($$rhash_params{'storage_full_path_location'},'\..{1,4}');

But see that there's a variation where you can actually provide a array of suffixes to the function, the array would contains all the known file extension.

So i'm trying to find a safe way to do this as i've seen that i've got some strange file names to process, i.e. file.0f1.htm, etc.

Question:

  1. Is there a list of commonly used extension for Windows and Unix systems? But in my case it's mainly for Windows.
  2. And is it safe to assume that all file names in Windows should have an extension ending with three letter characters?

And if there's an even better way to do this, please share.

Thanks.

Updates:

So obviously i must be drunk to forgot about those other extension. :) Thus i've updated the current regex to allow from 1-4chars.

In this case, how should i change my regex line to properly match it? Or is it an even better idea to look for all those commonly used extension from google and put them into an array to be passed to the function instead? My users are usually either students or teachers.

+3  A: 

1. Is there a list of commonly used extension for Windows and Unix systems? But in my case it's mainly for Windows.

Yes, loads, all over the internet: http://www.google.com/search?q=common+file+extensions

2. And is it safe to assume that all file names in Windows should have an extension ending with three letter characters?

No, it's perfectly possible to use '.c', '.java', etc in Windows.

Alnitak
Or ".java" or ".mpeg" or ".html" or ".C" (although that one is probably not used in a Windows world, as it's easily confused with ".c").
Joachim Sauer
good point - I was struggling to think of a four letter extension that doesn't have a common three letter abbreviated form (cf. .htm and .mpg)
Alnitak
hehe, my bad for being temporarily insane to forget about all those ext.
melaos
Alnitak: Java specifies its source code files to be called ".java", ther's no widely used three-letter version of that.
Joachim Sauer
In fact, many of the applications I've written use rather large extensions, such as ".pfmbin".
paxdiablo
+1  A: 

There are several fault assumptions in your code:

  • files need not have extensions. For example most binary executables on Unix/Linux/... don't have an extension at all. They are simply calls "bash", "wget", "sed", "Xorg", ...
  • extensions need not be three characters long, as @Alnitak already told you: ".c", ".java", ".mpeg", ".jpeg", ".html" are all perfectly fine and rather wide-spread extensions
  • cutting at the last "." is probably saver, but can still fail for files with no extensions or with multiple (or multi-part) extensions such as ".tar.gz", "tar.bz2", which occur rather often in the Unix/Linux/...-World
Joachim Sauer
yup i know that now. :)
melaos