ansaurus

Question

string parsing help

Answer 1

+2 A:

I'd take a multi-step approach:

split into section headings/content
parse each heading/content pair into the desired array structure

Here's an example, split into multiple lines so you can track what is going on:

^{Note the lack of sanity checking, this assumes nice, neat heading/content groups.
The regex was written for brevity and may or may not be sufficient for your needs.}

// Split string on a line of text wrapped in lines of only #'s
$parts = preg_split('/^#+$\R(.+)\R^#+$/m', $subject, null, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
// Tidy up leading/trailing whitespace for each heading/content-block
$parts = array_map('trim', $parts);
// Chunk into array("heading", "content")
$parts = array_chunk($parts, 2);

// Create the final array
$sections = array();
foreach ($parts as $part) {
    $sections[$part[0]] = explode("\n", $part[1]);
}

// Lets take a look
var_dump($sections);

salathe 2010-05-24 23:37:29

thanks for the help. I wound up going back and forth with @polygenelubricants....

sprugman 2010-05-25 15:22:48

Oookkkk. I'll never quite understand this place. :-/

salathe 2010-05-25 15:54:47

Answer 2

+1 A:

I was able to quickly wrote this up:

<?php
$text = <<<EOT
####################
Section One
####################
Data B.Thing=bar#
.##.#%#

####################
   Empty Section!
####################
####################
   Last section
####################

Blah

   Blah C# C# C#

EOT;
$entries = array_chunk(
   preg_split("/^#+/m", $text, null, PREG_SPLIT_NO_EMPTY),
   2
);
$sections = array();
foreach ($entries as $entry) {
  $key = trim($entry[0]);
  $value = preg_split("/\n/", $entry[1], null, PREG_SPLIT_NO_EMPTY);
  $sections[$key] = $value;
} 
print_r($sections);
?>

The output is: (as run on ideone.com)

Array
(
    [Section One] => Array
        (
            [0] => Data B.Thing=bar#
            [1] => .##.#%#
        )

    [Empty Section!] => Array
        (
        )

    [Last section] => Array
        (
            [0] => Blah
            [1] =>    Blah C# C# C#
        )

)

polygenelubricants 2010-05-25 09:09:51

That's awesome, thanks! But it doesn't quite work. :( It seems to choke on non-alpha characters in the data rows, which all of my data rows have, since they're name value pairs like "foo.bar=baz" http://ideone.com/u3xYo

sprugman 2010-05-25 14:08:07

@sprugman, well, I wasn't sure what the data pattern is, but if you can guarantee that it will never contain `#`, (e.g. no `"C# is awesome!"` or anything like that), then just use `[^#]+` instead of `[\w\s]+` http://ideone.com/zrx9n

polygenelubricants 2010-05-25 14:14:23

how 'bout if I guarantee that no line except the section delimiters will ever start with a #?

sprugman 2010-05-25 14:18:06

@sprugman: check out latest revision. Tell me if there's anything else I can do.

polygenelubricants 2010-05-25 14:47:51

thanks for all your help. I found another way to break it (unless I wasn't looking at the latest -- it seems to change the url every time you edit): http://ideone.com/2TOfp

sprugman 2010-05-25 14:58:01

ah: found your latest (http://ideone.com/59Y3V) looking...

sprugman 2010-05-25 15:00:36

I had to add a (\r?) to deal with the CRLFs of the source, but that seems to be working. Thanks!! http://ideone.com/V4X0D

sprugman 2010-05-25 15:09:47

@sprugman: you're the first person on stackoverflow who've went back and forth with me on ideone.com; I think this is neat!!!

polygenelubricants 2010-05-25 15:19:17

It worked pretty well, apart from having to update the url every time. (I wonder if they have a setting for that.) It might be nice to update your answer with the final version.... (Oh wait, maybe you did already. :)

sprugman 2010-05-25 15:29:10

ansaurus

tags:

views:

answers:

string parsing help

related questions