views:

401

answers:

2

Using php's DOMDocument->LoadHTMLFile('test.html'); keeps on returning an error to me, reporting for an error in the content at line 36. Deleting character after character, it turns out it's an apparently empty space that was the culprit. Copying/pasting that sentence in another editor (Editra), showed a strange RS character.

What is it, and more importantly, how can i avoid it from happening again ?

+5  A: 

It's a Record separator

Can be used as delimiters to mark fields of data structures. If used for hierarchical levels, US is the lowest level (dividing plain-text data items), while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it.

SEQ: ^^ - Dec: 30 - Hex: 1E - Acronym: RS

What you can do is use strtr() to strip away non visible characters. An example by Joel Degan on PHP.net should get you on your way.

Ólafur Waage
A: 

As I recall, PHP is throwing a non-fatal error in this case. It will complain about a lot of things, which you can't do anything about if the file is not created by you. What you can do, is use bad programming practices and suppress the errors by putting @ before the command.

@DOMDocument->LoadHTMLFile('test.html');

It should still load the file, but you will be "ignoring" the errors. Ignorance is bliss?

Brent Baisley
that's exactly the problem : the file is not done by me, and the problem is that the loaded nodeValue is cropped at the location of the RS character... I guess i'd better show a big ERROR sign to the user in that case...
pixeline
Right, you don't control the file which is why you would suppress the errors. If you load just about any web site into DOMDocument it will error out. But if you suppress the errors with @, you should be able to get the document loaded. It's worth a try since it so easy to do.
Brent Baisley
it does suppress the warning but the problem is that the html is not parsed past that error character.
pixeline