views:

111

answers:

6

I'm writing a new document-based cross-platform chemistry application (Win, Mac, Unix), which saves files in its own format (no standard format exists for this field). I'm trying to decide on a file extension for the saved files. My questions are:

  • How important is it nowadays to stick to 3 characters?
  • Where can you check how much this file extension is already used? (Google helps, of course, but it does not tell me how much a given app is popular)
  • Do I really need to use a file-specific extension? My save format is gzip'ed XML, so I could name it .xml.gz, but I fear it would confuse beginning users (i.e. when you see it, it does not immediately "ring a bell").
  • Finally, do you have other important guidelines when choosing for your own programs?

PS: I tried to keep the right balance between "giving too little information" and "being too specific to be really useful to others". I'll happily provide more information in comments if the need arises.

+4  A: 

FileInfo.com lists a lot of file extensions along with their own estimation of how much it is ued.

I suggest a unique extension (rather then xml.gz) so that the OS can identify the file type to users when looking at a file listing or whatever. 'Ringing a bell' is important, especially if you will have less sophisticated users.

I don't see any need to stick to 3 characters, but I wouldn't go bigger than 5 (I don't suppose I have a real reason for this, other than personal preference).

Ray
Thanks for FileInfo.com, it's a bit better (i.e. better signal-to-noise ratio) than the dozens other such sites Google led me to.
FX
+1  A: 
  1. Depends on the platform, but in general, not very important for newer Operating Systems. Check the documentation for the platforms you're targeting.

  2. I'm not aware of better alternatives to Google. Hopefully someone else has a better suggestion for this one.

  3. Not unless you have some reason to do so. Examples would be "I want to ensure that Windows always opens this program with my app". I'm not sure that your users need to be concerned with the extension anyway. The default configuration on Windows, for example, is to hide extensions for known file types. BUT if you have a compelling reason (such as allowing your program to easily identify files it should be able to handle, for example) then you could use the extension, or you could come up with something else.

  4. I have only ever once written a program where I thought I needed to come up with my own extension. I used my initials. Then later I realized I didn't really need a special extension and reverted to ".xml". However, most extensions seem to be something that seems to mean something. (.doc for documents, etc.) so something meaningful is a good idea if you do need to go this route.

David Stratton
Regarding 3: I don't strongly "want to ensure that Windows always opens this program with my app". But I think it's very poor experience for the end user if, when double-clicking on the shiny chemistry file they save (or were sent), it opens in an XML editor (or another random app that has XML save files).
FX
I agree. That qualifies as "some reason to do so".
David Stratton
+6  A: 
  • How important is it nowadays to stick to 3 characters?

It's not unless you have to support older operating systems. All current OSes handle >3 char file extensions without any problems. Think of .html, .config, .resx, and I'm sure there are more.

  • Where can you check how much this file extension is already used?

check out FileExt.

  • Do I really need to use a file-specific extension? My save format is gzip'ed XML, so I could name it .xml.gz, but I fear it would confuse beginning users (i.e. when you see it, it does not immediately "ring a bell").

Remember that windows (and windows users) associate files with applications by extension, so using something too generic like .xml.gz may cause problems. You are probably better coming up with something that is more specific to your file type or application. Users don't care weather your format is gzipped xml internally, they care about what is in the file. Think about abstraction layers, your users will think of it as a file containing chemistry info not gzipped xml, so .chem is far more appropriate than .xml.gz

Some suggestions of things to thing about:

  1. Obviously, don't clash with anything big - Don't use .doc, .xls, .exe, etc.

  2. Don't clash with anything common in your industry domain that your user demographic is likely to have installed. For example, if you are writing a programming tool, don't use .cs or .cpp. You probably know your domain best, so write a list of all the apps you and your users are likely to have installed, and any of their competitors and avoid them.

  3. Make sure your app includes the options to register and unregister the extension. don't just automatically do it in the installation, make sure it's an option.

  4. Remember unix/linux and Mac are case sensitive, so consider sticking to always all lower case by default.

  5. Remember CD/DVD file naming rules are stricter, so don't use non alpha numeric characters.

  6. Finally, remember that most non-tech users are going to have file extensions turned off, so don't stress about it too much.

There is more info here.

Wikipedia has lists of files extensions here (by type) and here (alphabetical), and also some general information

Simon P Stevens
"Remember that windows (and windows users) associate files with applications by extension": I try to forget how such a thing is even possible in 2010, but now thanks to you the inanity of it might wake me at night!
FX
A: 
  1. Barring your needing to be compatible with a specific OS that you know still has the three-letter limitation, no need to keep it to three characters. It may be useful to have a three-character version of it if you end up supporting those platforms.

  2. The Wikipedia list of file formats is pretty good. Some mime mapping lists will list common extensions associated with those mappings. Ray already mentioned FileInfo.com.

  3. It's a convenience thing; I'd probably go with your own but document the fact that they're just gzipped XML files conforming to a specific DTD and make it easy for users to use .xml.gz instead. Be sure that your software doesn't care about the extension, so that users could even choose their own if they wanted, although I'd tend to avoid encouraging them to by providing a reasonable default.

  4. I'd go for typeability, clarity, uniqueness, and brevity -- in that order. For instance, .config is a lot easier to type than .q2z but it falls down on uniqueness. (I'm not suggesting it for your app; it's an example.) Similarly, .q2z is just a pain. :-) So for instance, .chemstuff is easy to type and probably not in wide use elsewhere. (Again, not a suggestion, just an example.)

T.J. Crowder
+4  A: 

It sure depends on the OSes you want to support, but people have globally moved over the 3-characters extension limit these days: .html is well used for webpages, for example.

Of course, if you go to much longer extensions, people will stop visually recognizing it as a file extension, I think...

will
A: 

Have it as document_name.app_name.xml.gz where document_name and app_name are variables, the latter some easily readable and recognisable short string of your application's title.

Modern systems are quite flexible, and there is absolutely no need to drag the 3-character extensions further along in time with us.

I agree that .xml.gz would confuse users, however keep in mind that modern systems are moving into recognizing files not based on extensions but by probing their headers and even contents instead. In fact, users do not often even see the extensions. For gzipped XML files, a system may decide to first unpack the file stream in memory, then find out it is a literal XML file, then it may take its 'xmlns' as the application identifier. However, such systems are not yet widespread use. In any case, don't make the mistake of only opening files by extension - be smart and raise the bar - do exactly the above to find out if the file can be considered a document for your application.

amn
I don't really feel good about something as long as `doc_name.app_name.xml.gz`. (And what if the doc_name, as given by the user, includes a dot, it makes it even less clear.)
FX
It is not a popularity contest. Everything but the document name (i.e. everything preceding the first dot from the left) should be globally unique among all other files that are not documents of said application. Also, like I said, both document_name and app_name are placeholders, so it can be something like untitled-1.chimp.xml.gz. At least it serves the purpose of both helping your application uniquely identify its document files while at the same time making these files automatically compatible with archivers and xml parsers, out of the box.
amn