tags:

views:

873

answers:

6

UPDATE: Thank you all for your input. Some additional information.

It's really just a small chunk of markup (20 lines) I'm working with and had aimed to to leverage a regex to do the work.

I also do have the ability to hack up the script (an ecommerce one) to insert the classes as the navigation is built. I wanted to limit the number of hacks I have in place to keep things easier on myself when I go to update to the latest version of the software.

With that said, I'm pretty aware of my situation and the various options available to me. The first part of my regex works as expected. I posted really more or less to see if someone would say, "hey dummy, this is easy just change this....."

After coming close with a few of my efforts, it's more of the principle at this point. To just know (and learn) a solution exists for this problem. I also hate being beaten by a piece of code.

ORIGINAL:

I'm trying to leverage regular expressions to add a CSS a class to the first and last list items within an ordered list. I've tried a bunch of different ways but can't produce the results I'm looking for.

I've got a regular expression for the first list item but can't seem to figure a correct one out for the last. Here is what I'm working with:

    $patterns = array('/<ul+([^<]*)<li/m', '/<([^<]*)(?<=<li)(.*)<\/ul>/s');
    $replace = array('<ul$1<li class="first"','<li class="last"$2$3</ul>');
    $navigation = preg_replace($patterns, $replace, $navigation);

Any help would be greatly appreciated.

+3  A: 

Jamie Zawinski would have something to say about this...

Do you have a proper HTML parser? I don't know if there's anything like hpricot available for PHP, but that's the right way to deal with it. You could at least employ hpricot to do the first cleanup for you.

If you're actually generating the HTML -- do it there. It looks like you want to generate some navigation and have a .first and .last kind of thing on it. Take a step back and try that.

Dustin
Thanks for the input. Yes, I do have an HTML parser to make use of. Reading from various sources the stance "never parse html with regex", I kinda anticipated being pointed in this direction. I'm just glad everyones suggestions were done tastefully and not flaming me ;-)More info in org. post.
greaterweb
+1  A: 

You wrote:

$patterns = array('/<ul+([^<]*)<li/m','/<([^<]*)(?<=<li)(.*)<\/ul>/s');

First pattern:
ul+ => you search something like ullll...
The m modifier is useless here, since you don't use ^ nor $.

Second pattern:
Using .* along with s is "dangerous", because you might select the whole document up to the last /ul of the page...
And well, I would just drop s modifier and use: (<li\s)(.*?</li>\s*</ul>) with replace: '$1class="last" $2'

In view of above remarks, I would write the first expression: <ul.*?>\s*<li

Although I am tired of seeing the Jamie Zawinski quote each time there is a regex question, Dustin is right in pointing you to a HTML parser (or just generating the right HTML from the start!): regexes and HTML doesn't mix well, because HTML syntax is complex, and unless you act on a well known machine generated output with very predictable result, you are prone to get something breaking in some cases.

PhiLho
+2  A: 

+1 to generating the right html as the best option.

But a completely different approach, which may or may not be acceptable to you: you could use javascript.

This uses jquery to make it easy ...

$(document).ready(
    function() {
        $('#id-of-ul:firstChild').addClass('first');        
        $('#id-of-ul:lastChild').addClass('last');
    }

);

As I say, may or may not be any use in this case, but I think its a valid solution to the problem in some cases.

PS: You say ordered list, then give ul in your example. ol = ordered list, ul = unordered list

benlumley
A: 

You could load the navigation in a SimpleXML object and work with that. This prevents you from breaking your markup with some crazy regex :)

Endlessdeath
A: 

I don't know if anyone cares any longer, but I have a solution that works in my simple test case (and I believe it should work in the general case).

First, let me point out two things: While PhiLho is right in that the s is "dangerous", since dots may match everything up to the final of the document, this may very well be what you want. It only becomes a problem with not well formed pages. Be careful with any such regex on large, manually written pages.

Second, php has a special meaning of backslashes, even in single quotes. Most regexen will perform well either way, but you should always double-escape them, just in case.

Now, here's my code:

<?php
$navigation='<ul>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
<li>Beer</li>
<li>Water</li>
</ul>';

$patterns = array('/<ul.*?>\\s*<li/',
                  '/<li((.(?<!<li))*?<\\/ul>)/s');
$replace = array('$0 class="first"',
                 '<li class="last"$1');
$navigation = preg_replace($patterns, $replace, $navigation);
echo $navigation;
?>

This will output

<ul>
<li class="first">Coffee</li>
<li>Tea</li>
<li>Milk</li>
<li>Beer</li>
<li class="last">Water</li>
</ul>

This assumes no line feeds inside the opening <ul...> tag. If there are any, use the s modifier on the first expression too.

The magic happens in (.(?<!<li))*?. This will match any character (the dot) that is not the beginning of the string <li, repeated any amount of times (the *) in a non-greedy fashion (the ?).

Of course, the whole thing would have to be expanded if there is a chance the list items already have the class attribute set. Also, if there is only one list item, it will match twice, giving it two such attributes. At least for xhtml, this would break validation.

Pianosaurus
A: 

checkout http://www.regexlib.com it has most things you've thought of and 1000s of things you haven't

Frustrating Developments