views:

277

answers:

6

Let's say I have this input:

I can haz a listz0rs!
# 42
# 126
I can haz another list plox?
# Hello, world!
# Welcome!

I want to split it so that each set of hash-started lines becomes a list:

I can haz a listz0rs!
<ul>
    <li>42</li>
    <li>126</li>
</ul>
I can haz another list plox?
<ul>
    <li>Hello, world!</li>
    <li>Welcome!</li>
</ul>

If I run the input against the regex "/(?:(?:(?<=^# )(.*)$)+)/m", I get the following result:

Array
(
    [0] => Array
    (
        [0] => 42
    )
    [1] => Array
    (
        [0] => 126
    )
    [2] => Array
    (
        [0] => Hello, world!
    )
    [3] => Array
    (
        [0] => Welcome!
    )
)

This is fine and dandy, but it doesn't distinguish between the two different lists. I need a way to either make the quantifier return a concatenated string of all the occurrences, or, ideally, an array of all the occurrences.
Ideally, this should be my output:

Array
(
    [0] => Array
    (
        [0] => 42
        [1] => 126
    )
    [1] => Array
    (
        [0] => Hello, world!
        [1] => Welcome!
    )
)

Is there any way of achieving this, and if not, is there a close alternative?
Thanks in advance!

A: 

I'd say don't try to do it all in a single regex - instead, first use a regex to match sets of consecutive lines that all begin with # signs and wrap those lines with a <ul></ul> pair. Then use a second regex (or not even a regex at all - you could just split on line breaks) to match each individual line and convert it to <li></li> format.

Amber
I thought of doing that as well, but the problem is that you can't quantify the lines, so when you wrap them in ul tags, you'll be wrapping *each* line instead of the entire set.
Hussain
With a multiline regex, you could match multiple lines at once. You'd just need to match the whitespace newline characters between them too.
Amber
A: 

If it was me I would:

  1. explode("\n", $input) into an array where 1 key = line
  2. foreach through that array
  3. whenever you get a line that doesn't start with a #, that's when you add your closing/opening ul tags

Add a little more to deal with unexpected input (like two non hash lines in a row) and you're good.

Syntax Error
A: 

You could avoid regex altogether, and simply try a simpler approach by having it read the file, line by line (an array of lines), and every time it encounters a non-hash-started line, it starts a new list. Like so:

// You can get this by using file('filename') or 
// just doing an explode("\n", $input)
$lines = array(
    'I can haz a listz0rs!',
    '# 42',
    '# 126',
    'I can haz another list plox?',
    '# Hello, world!',
    '# Welcome!'
);

$hashline = false;
$lists = array();
$curlist = array();
foreach ($lines as $line) {
    if ($line[0] == '#')
        $curlist[] = $line;
    elseif ($hashline) {
        $lists[] = $curlist;
        $curlist = array();
        $hashline = false;
    }
}

A little clean-up may be in order, but hopefully it helps.

(after reading new answers, this is basically an indepth explanation of Syntax Error's answer.)

EDIT: You may want it to strip off the # at the beginning of each line too.

hlissner
A: 

Looks like Syntax Error has already explained what I'm doing. But here goes the link to a working example.

codaddict
A: 

With structured content like this, I would not do this as a regex. How about another approach?

$your_text = <<<END
I can haz a listz0rs!
# 42
# 126
I can haz another list plox?
# Hello, world!
# Welcome!
END;

function printUnorderedList($temp) {
    if (count($temp)>0) {
        print "<ul>\n\t<li>" .implode("</li>\n\t<li>", $temp) . "</li>\n</ul>\n";
    }
}

$lines = explode("\n", $your_text);
$temp = array();
foreach($lines as $line) {
    if (substr($line, 0, 1) == '#') {
        $temp[] = trim(substr($line,1));
    } else {
        printUnorderedList($temp);
        $temp = array();
        echo $line . "\n";
    }
}
printUnorderedList($temp);
artlung
+1  A: 

If you want to do this with regular expressions, you'll need two. Use the regex ^(#.*\r?\n)+ to match each list and add tags around it. Within each list (as matched by the first regex), search-and-replace ^#.* with <li>$0</li> to add tags around each list item. Both regexes require ^ to match at line breaks (/m flag in PHP).

In PHP you can use preg_replace_callback and preg_replace to achieve this in just a few lines of code.

$result = preg_replace_callback('/^(#.*\r?\n)+/m', 'replacelist', $subject);

function replacelist($groups) {
  return "<ul>\n" .
    preg_replace('/^#.*/m', '    <li>$0</li>', $groups[0])
    . "</ul>\n";
}
Jan Goyvaerts