views:

81

answers:

2

Hello everyone,

Im trying to use the DOM in PHP to do a pretty specific job and Ive got no luck so far, the objective is to take a string of HTML from a Wordpress blog post (from the DB, this is a wordpress plugin). And then out of that HTML replace <div id="do_not_edit">old content</div>" with <div id="do_not_edit">new content</div>" in its place. Saving anything above and below that div in its structure.

Then save the HTML back into the DB, should be simple really, I have read that a regex wouldnt be the right way to go here so Ive turned to the DOM instead.

The problem is I just cant get it to work, cant extract the div or anything.

Help me!!

UPDATE

The HTML coming out of the wordpress table looks like:

Congratulations on finding us here on the world wide web, we are on a  mission to create a website that will show off your culinary skills  better than any other website does.

<div id="do_not_edit">blah blah</div>
We want this website to be fun and  easy to use, we strive for simple elegance and incredible functionality.We aim to provide a 'complete package'. By this we want to create a  website where people can meet, share ideas and help each other out.

After several different (incorrect) workings all Ive got below is:

$content = ($wpdb->get_var( "SELECT `post_content` FROM $wpdb->posts WHERE ID = {$article[post_id]}" ));        

$doc = new DOMDocument();
$doc->validateOnParse = true; 
$doc->loadHTMLFile($content);
$element = $doc->getElementById('do_not_edit');
echo $element;
+1  A: 

Your HTML is not a complete HTML document, which is what DOMDocument expects. One option would be to wrap your HTML so it's a complete document:

$content = ($wpdb->get_var( "SELECT `post_content` FROM $wpdb->posts WHERE ID = {$article[post_id]}" ));

$content = '<html><head><title></title></head><body>'.$content.'</body></html>';

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$element = $doc->getElementById('do_not_edit');
echo $element;

It's a bit hacky, but might easily solve the problem.

Matt S
I still just get a NULL object.
bluedaniel
See my edits: You want loadHTML(), not loadHTMLFile(). Also, I'd suggest not validating.
Matt S
Still null, Ive tried validation true and false, tried HTML and HTMLfile, Cant understand why it wouldnt pick up a simple div with a unique id
bluedaniel
@bluedaniel: You must supply a full, valid HTML file including a DTD which defines an attribute to be of type ID for `getElementById` to work.
Josh
yeah Ive read that but cant work out what DTD is, or how to code it
bluedaniel
A DTD is what you get with a full HTMl document, for example, `<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">`
Josh
+2  A: 

If you are sure that the HTML from WordPress contains only one div, the following should work:

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$divs = $doc->getElementsByTagName('div');
echo $divs->item(0)->textContent;

If not, try:

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$divs = $doc->getElementsByTagName('div');

for($i=0; $i<$divs->length; $i++)
{
  $id = $divs->item($i)->attributes->getNamedItem('id');
  if($id && $id->value == 'do_not_edit')
  {
    //your code here...
    $node = $divs->item($i);
    $newText = new DOMText("This is some new content");

    $node->appendChild($newText);
    $node->removeChild($node->firstChild);
    break;
  }
}

$html = $doc->saveHTML();
Josh
cant confirm its the only div.
bluedaniel
@bluedaniel: Try my updated answer
Josh
Your second answer still picks up the first div in the text, `if($id ` is not being called correctly.
bluedaniel
Josh
josh thank you, but now comes to my second problem, how can I replace the contents?
bluedaniel
@bluedaniel: See my edited answer using $doc->saveHTML(). The only issue is, this will convert your code to a full HTML document, an issue I am not sure how to resolve...
Josh
that code does not save the "This is some new content" in place,
bluedaniel
we are so close I can almost smell the success!
bluedaniel
It doesn't? It does for me...
Josh
@bluedaniel: I thought my code should do the trick. If you're still having difficulty please let me know and post as many specific details as possible. Thanks!
Josh
Sorry Im going back to this late but had other projects on the go, the problem was basically an encoding one, your code did work perfectly but the code coming out of the wordpress db needed sorting out first.Thanks so much for your help!
bluedaniel
hey josh it seems there is something in the wordpress content breaking your code, this is for a really high profile project for a news company, can you email me at [email protected] in order to sort this out? Im on a tight deadline and I cant figure out what is breaking it.Thanks Josh.
bluedaniel