views:

307

answers:

2

I have a lot of simpler rules working on this site so I know mod_rewrite is working. I just can't figure out how to create a rule for this situation. I'm using Joomla CMS and one component in particular is generating awful URLs that duplicate other (pretty) URLs on the site. There is a consistent pattern so I can rewrite the URLs but this is way out of my league for using regular expressions.

The bad URLs look like this:

/component/content/article/111-category-name/111-article-name.html?directory=2

(the "111" parts are slugs generated to give unique IDs to the category and article portion of the URL, so these numbers are unique per URL, and the directory=2 changes depending on the number ID of the directory being browsed)

The good URLs (already on the site, generated by core Joomla SEF) look like this:

/category-name/article-name.html

I know I need to detect the bad URLs, then rewrite to remove those slug IDs and the trailing query string. I've tried for an embarrassing amount of time to work it out and I don't think I'm even remotely close.

My eternal gratitude to a mod_rewrite/regular expressions guru who can break this down for me. Thanks! -Kelly

+2  A: 

Assuming this goes in your local .htaccess.

RewriteRule ^component/content/article/\d+-(.*?)/\d+-(\.*?\.html) /$1/$2 [L]

If that goes in your httpd.conf I believe it's a minor change to:

RewriteRule ^/component/content/article/\d+-(.*?)/\d+-(\.*?\.html) /$1/$2 [L]

but I'm not 100% sure about this because I rarely do it this way.

You can optionally append the query string to the resultant URL:

RewriteRule ^component/content/article/\d+-(.*?)/\d+-(\.*?\.html) /$1/$2 [L,QSA]
cletus
yes, I'm using the .htaccess file in the site root. I tried adding your first rule suggestion without the leading / and it does not have any affect on the urls. Any idea what the problem may be? Thanks for your help so far.
KellyRued
+1  A: 

Try this rule:

RewriteRule ^component/content/article/[0-9]+-([^/]+)/[0-9]+-([^/]+\.html)$ /$1/$2? [L,R=301]

The empty query string in the replacement will remove the original query string if present. And the R=301 flag will cause an external, permanent redirect.

Gumbo
Thanks Gumbo! This worked for me.
KellyRued