ansaurus

Question

Answer 1

A:

Why don't you just use ^(.*)\.html$? This will match any string that ends in .html. After all, filenames can contain more than one dot.

[^A-Z]+ matches index if the regex is applied case-sensitively. Perhaps that's the reason? Why [^.]+ should fail is beyond me, though.

Tim Pietzcker 2010-03-08 17:07:54

Ok, I worked out that I have been an idiot. Your answer is quite correct. Alas, I did not consider the implications of my (unconditional) RuleRewrite: Once I had morphed index.html into index.var, Apache's type map jumped into action and looked inside the index.var file for a resource to map. And it pulled DE/index.html out of the hat. THEN Apache subjected DE/index.html to yet another rewrite process which ended up mangling that name to DE/index.var. And THAT file then did not exist. Isn't computing wonderful :-)))

Ollie2893 2010-03-09 10:18:50

Oops :) Nice detective work.

Tim Pietzcker 2010-03-09 10:41:54

Answer 2

A:

The . matches everything but newlines.
Inside of a character class, the ^ means "not".
The + means one or more of the preceding character class.

So when you write ([^.]+), that says "match one or more newlines". So unless you have a URL composed of newlines followed by ".html", this will not work.

^([^A-Z]+)\.html$ works because it matches one or more characters that are not uppercase letters. If you have any uppercase letters before the ".html" in your URL, this one will fail too.

Tim Pietzcker's suggestion is correct: just use ^(.*)\.html$,keeping in mind that this won't work in the odd case that you have newlines in your URL.

In the odd case that you actually have URL's with newlines in them, you can use ^([\d\D]+)\.html$, which will match digits and non-digits (i.e. everything) up until the ".html".

Nick 2010-03-08 18:04:09

Ok ... interesting. Two things confuse me:(1) My understanding from regex is that each expression tries to gobble the longest match. So how does the expression ^(.*)\.html$ function? It seems to me that .* should swallow ".html". To then match .html after, it would have to retrace its steps?(2) Are you quite sure that "." inside a character class [] retains the meaning you ascribe (which, I agree, it has outside such a class)? If so, I also tried [^\.]+ with no more joy. Surely, the \ should have escaped the regular meaning?

Ollie2893 2010-03-08 23:27:38

PS: Incidentally, "^(.*)\.html$ $1.var" also fails. Before you think I am looking at a fundamental failure, "^(index)\.html$ $1.var" works (for target index.html).

Ollie2893 2010-03-08 23:37:37

ansaurus

tags:

views:

answers:

[^.] causing headache in RewriteRule

related questions