So I'm trying to parse HTML pages and looking for paragraphs (<p>
) using get_elements_by_tag_name('p');
The problem is that when I use $element->nodeValue
, it's returning weird characters. The document is loaded first into $html using curl then loading it into a DomDocument.
I'm sure it has to do with charsets.
Here's an example of a response: "aujourd’hui".
Thanks in advance.