views:

101

answers:

3

How do I split the content below into separate files without the placeholder tags. I'd also like to take the text inside the placeholder tags and place them inside a new contents file.

<div class='placeholder'>The First Chapter</div>

This is some text.

<div class='placeholder'>The Second Chapter</div>

This is some more text.

<div class='placeholder'>Last Chapter</div>

The last chapter.

Thanks.

UPDATE:

I've tried a modified version of MartinodF code, but can't get it to work.

$text=file_get_contents("t.txt");


$parts = preg_split('/\n?<div class=\'placeholder\'>(.+?)<\/div>\n/im', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$parts_num = count($parts) / 2;

$titles = $files = array();
for($x = 0; $x < $parts_num - 1; $x++) {
    $titles[] = $parts[$x * 2 + 1];
    $files[] = $parts[$x * 2 + 1] . "\n" . $parts[$x * 2 + 2];
}


var_dump($titles);
var_dump($files);

echo $titles[1];

UPDATE 2: No longer reliant on separate txt file, but still doesn't work.

$text="<div class='placeholder'>The First Chapter</div>
This is some text.
<div class='placeholder'>The Second Chapter</div>
This is some more text.
<div class='placeholder'>Last Chapter</div>
The last chapter.
";


$parts = preg_split('/\n?<div class=\'placeholder\'>(.+?)<\/div>\n/im', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$parts_num = count($parts) / 2;

$titles = $files = array();
for($x = 0; $x < $parts_num - 1; $x++) {
    $titles[] = $parts[$x * 2 + 1];
    $files[] = $parts[$x * 2 + 1] . "\n" . $parts[$x * 2 + 2];
}


var_dump($titles);
var_dump($files);

echo $titles[1];
+2  A: 

Use a Xml/HTML parser to walk over the dom and pull what you need. Theres SimpleXml and DOMDocment buit directly into php. Or you could use something like Zend_Dom_Query or SimpleHTML.

prodigitalson
A: 

It seems to me that you can simply use regular expressions...

http://www.roscripts.com/PHP_regular_expressions_examples-136.html -- see the end of document there's a few regular expressions for HTML.

... but maybe you presented only a part of your task.

MartyIX
A: 

If I understand correctly what you're doing (like extracting titles and contents of each chapter from a script of some sort), MartyIX is right, you can use regular expressions:

$parts = preg_split('/\n?<div class=\'placeholder\'>(.+?)<\/div>\n/im', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$parts_num = count($parts) / 2;

$titles = $files = array();
for($x = 0; $x < $parts_num - 1; $x++) {
    $titles[] = $parts[$x * 2 + 1];
    $files[] = $parts[$x * 2 + 1] . "\n" . $parts[$x * 2 + 2];
}

var_dump($titles);
var_dump($files);

$titles will be an array containing all the "titles", you can write one on each line and have your "contents" file (which will be like the index).

$files, on the other hand, will contain each chapter (the title, without tag around it, a newline and then the text) that you can write out each one to a different file to have your text split into chapters.

MartinodF
I like the format, but it doesn't seem to work. I used file_get_contents to get the text to put into the $text variable. Both the arrays seem to be empty.
usertest
You can see it working here: http://matrix87.com/so.php The page also outputs its source (via echo(htmlentities(file_get_contents(\_\_FILE\_\_)));) so you can check how it's working.
MartinodF
Hi, I've tried your code as I've shown in my updated post but can't get it to work. Can you see what I could be missing? Thanks.
usertest
I would need the t.txt file. Can you upload it somewhere?
MartinodF
Hi again, the txt file consists exactly of the content in my original post
usertest
I've added another version on my post that doesn't need an external txt file, so we can rule that out. Does my version of the code work on your PC? Thanks.
usertest