tags:

views:

509

answers:

5

What characters are allowed in filenames for HTML files on ALL servers (*nix, Windows, etc.) ? I'm looking for the "lowest common denominator" that will work on all servers. USE: I'm naming a file to be served up publicly (Mysite.com/My-Page.htm)

E.g., space? _ - , etc.

E.g., can I have File-Name.htm, File_Name.htm File Name.htm?

Obviously, this needs to work with all servers and browsers. (IIRC, the name is limited by the server not the browser, but I could be wrong).

A: 

There isn't such a thing as an html filename.
Certain characters have to be encoded in html (eg if used in links) but the allowed characters in the document names will depend on the web server (and possibly the file system on the server).

Martin Beckett
A: 

Any file name will be URL-encoded so you should be fine. And for the record all three of your file names would work just fine.

Andrew Hare
A: 

If you don't want your filenames to be encoded by the server, you should avoid reserved characters: $&+,/:;=?@ and unsafe characters: space, quotation marks, <>#%{}|\^~[]`

But as the previous answers stated, the web servers should cope with whatever you want to use by encoding the chars.

Jim Downing
+1  A: 

Be sure to eliminate

* . " / \ [ ] : ; | = ,

which are never allowed, due to inconsistencies in file naming conventions standard practice is to use a-z and 0-9 and the underscore character. Space is needful for most users but if you can get away from using it there are parsing issues that improve reliability, you can read rfc's on mime ( multi-part internet mail extensions ) to get a taste of what is involved.

No matter what you do, something somewhere is likely to make life difficult - so much so that I now use cryptographic methods to generate random a-z lowercase strings and use those as filenames, embedding the useful info in the file source code.

Avoid the ampersand at any cost, ...

Nicholas Jordan
At face value this is an incorrect answer. With the exception of "/", all the characters you mention are valid characters for a filename on unix-like systems. They shouldn't necessarily be used, but they are valid.
Bryan Oakley
Like Jim says, server traffic is supposed to encode anything that is not allowed in URL's - a prime examp is space is supposed to be %20 but what you see is + for spaces, which also is the character used for + so ( in my not so humble opinion ) the situation is exactly that of a cat chasing it's tail when it already has hold of it.
Nicholas Jordan
Yes, Bryan - it's short and cheap. It's just the first things I remove because even if {[%%%]}.ext makes it across the server hops, it confuses char[256] in fp*. ( not to mention embedded nulls and so on ) ever seen the crash the os with the dot operator in filename?
Nicholas Jordan
Note also that Mr. Oakley's correction was posted soon as the issue was immediately apparent to him - he works on UNICE ( plural of UNIX ) where the kernel is much stronger ... what you have is what you see: cross-platform issues
Nicholas Jordan
You can perfectly well use an ampersand... you have to HTML-encode it if you're making a link to it, but then you already have to do that for all the URLs with query string parameters.
bobince
@bobince: What made me say that was something I saw in Java comments in URL class ( or related class ) that had to do with escaping behviour that was inconsistent with an rfc on URL's ( allowed characters / escaped characters ) I stand corrected.
Nicholas Jordan
+5  A: 

What characters are allowed in filenames for HTML files on servers?

That totally depends on the server. HTTP itself allows any character at all, including control characters and non-ASCII characters, as long as they are suitably %-encoded when requested in a URL.

On a Unix server you cannot use ‘/’ or the zero byte. (If you could use them, they'd appear in the URL as ‘%2F’ and ‘%00’ respectively.) You also can't have the specific filenames ‘.’ or ‘..’, or the empty string.

On a Windows server you have all the limitations of a Unix server, plus you also can't use any of \/:*?"<>| or control characters 1-31 and you can't have leading or trailing dot or spaces, and you'll have difficulty using any of the legacy device filenames (CON, PRN, COM1 and many more).

This is nothing to do with HTTP; just how filenames work on Windows, which is complicated.

can I have File-Name.htm, File_Name.htm File Name.htm?

Certainly. But in the last case you should link to it by URL-encoding the space:

<a href="File%20Name.htm">thingy</a>

Browsers will usually let you get away with leaving the space in, but it's not really valid. If you want to avoid having to think about URL-escaping, HTML-escaping and case-sensitive issues, stick to a–z, 0–9 and underscore.

bobince