tags:

views:

88

answers:

2

I've never really done much with parsing text in PHP (or any language). I've got this text:

1 (2) ,Yes,5823,"Some Name
801-555-5555",EXEC,,"Mar 16, 2009",0.00,
1 (3) ,,4821,Somebody Else,MBR,,"Mar 11, 2009",,0.00
2 (1) ,,5634,Another Guy,ASSOC,,"Mar 15, 2009",,0.00

You can see the first line has a break, I need to get it be:

1 (2) ,Yes,5823,"Some Name 801-555-5555",EXEC,,"Mar 16, 2009",0.00,
1 (3) ,,4821,Somebody Else,MBR,,"Mar 11, 2009",,0.00
2 (1) ,,5634,Another Guy,ASSOC,,"Mar 15, 2009",,0.00

I was thinking of using a regular expression to find \n within quotes, (or after a quote, since that wouldn't create false matches) and then replacing it with nothing using PHP's preg_replace(). I'm currently researching regex since I don't know any of it so I may figure this out on my own (that's always best) but no doubt a solution to a current problem of mine will help me get a handle on it ever sooner.

Thanks so much. If I could, I'd drop a bounty on this immediately.

Thanks!

+3  A: 

If the text has that fixed format, maybe you won't need regex at all, just scanning the line for two double quotes and if there is only one, start joining lines until you find the closing one...

Problems may arise if there can be escaped quotes, single quotes to delimit the strings, etc. but as long as there are not that kind of things, you should be fine.

I don't know PHP, so here is some pseudocode:

open = False
for line in lines do
    nquotes = line.count("\"")
    if not open then
        if nquotes == 1 then
            open = True
            write(line)
        else #we assume nquotes == 2
            writeln(line)
        end
    else
        if nquotes == 0 then
            write(line)
        else #we assume nquotes == 1
            open = False
            writeln(line)
        end
    end
 end
fortran
Agreed - this is not a problem for regular expressions as it requires balancing, it's a problem for a simple scanner.
Greg Beech
+1  A: 

Here's essentially fortran's answer in PHP

<pre>
<?php

$data = <<<DATA
1 (2) ,Yes,5823,"Some Name
801-555-5555",EXEC,,"Mar 16, 2009",0.00,
1 (3) ,,4821,Somebody Else,MBR,,"Mar 11, 2009",,0.00
2 (1) ,,5634,Another Guy,ASSOC,,"Mar 15, 2009",,0.00
DATA;

echo $data, '<hr>';

$lines = preg_split( "/\r\n?|\n/", $data );

$filtered = "";
$open = false;
foreach ( $lines as $line )
{

  if ( substr_count( $line, '"' ) & 1 && !$open )
  {
    $filtered .= $line;
    $open = true;
  } else {
    $filtered .= $line . "\n";
    $open = false;
  }
}

echo $filtered;
?>
</pre>
Peter Bailey