tags:

views:

86

answers:

5

Clean URLs seem pretty simple but I'm in need of a special situation. I would like to be able to include the .html suffix or nothing at all but not any other extension:

someurl = pass
someurl/somepage = pass
someurl.html = pass
someurl/somepage.html = pass

someurl.css = fail
someurl.exe = fail
someurl.anyotherextension = fail
someurl/someother.ext = fail

Is this possible? Would I have to somehow exclude the extensions I don't want?

Edit:

None of the answers so far seem to work. The only thing that I've discovered on my own that works is:

^/([\w]*(.html)?)$
but it will not work with slashes in the url. Adding a slash inside the character class brackets makes it fail.

A: 

Try this:

(?:^|/)[^./]+(?:\.html)?$

Translation: starting from the last / if there is one (or from the beginning of the string if not) match one or more of anything except / or ., optionally terminated by .html.

Alan Moore
A: 
/\.html$|(?:^|.*\/)[^\.]+$/

ending with ".html" or have no "." from the beginning of the url or the last / to the end (you can have folders containing a ".")

zolex
A: 

What about this?

(^[^\.]+?$)|(^.+?\.html$)

This matches either a string that doesn't contain any . or a string that ends with .html.

Or use this, if you want to use dots in your "folder" names:

(^.+?/[^\.]+?$)|(^.+?\.html$)

Matches either a string that contains no . after the last / or a string that ends with .html.

Dave
A: 

Regex option for clean URLs with .html extension option:

^/([\w\/]*(\.html)?)$

Full lighttpd.conf line:

url.rewrite = ( "^/([\w\/]*(\.html)?)$" => "index.php?page=$1" )

Quick reminder: Absolute paths to files or a base href should be implemented in any pages that pass this regex.

ShiGon
A: 

Instead of trying to use a regex to match the URLs you want to allow (as the other answers seem to try), use a regex to match the URLs that you want to block:

\.(?!html$)[^./]*$

This regex matches the extension of a URL, unless the extension is .html. URLs without an extension or an .html extension are not matched. Your examples don't include URLs with queries (?param=value) or fragements (#anchor) so the regex does not account for those. I'm also assuming your regex flavor supports lookahead.

Jan Goyvaerts