+1  A: 

If you have correct encoding you dont need to escape Ø (Ø). Try to use unicode to be sure.

If there is no way to change the behavior try unescaping HTML entities, check PHP manual.

Cem Kalyoncu
+1: unicode FTW
Jonathan Fingland
I get the feeling this doc comes from elsewhere.
spender
yeah, I'm parsing it, not writing it.
Maarten
Ow, its too bad they dont know how to produce XML. IMO only method left is to replace HTML entities before parsing. PHP has a function for it.
Cem Kalyoncu
as I commented above, I think it is valid to define entities in the DTD, and in the mod http://www.blackwellpublishing.com/xml/dtds/4-0/bpg4-0entities.mod the entity *is* defined. The character is just not being used somehow
Maarten
If the data is from elsewhere surely DTD is defined by someone else and IMO it will not be logical to duplicate and change it.
Cem Kalyoncu
+2  A: 
Williham Totland
one of the mods that is defined in the dtd seem to have it:http://www.blackwellpublishing.com/xml/dtds/4-0/bpg4-0entities.mod, and as PHP was throwing an entity not found error before I added the correct url for the dtd I think PHP is probably parsing this?
Maarten
it is in the dtd!
nickf
Well, that's what i get for not fully resolving the DTD mentally...Irrespectively, that DTD has a rather perverse manner of defining that entity, it'd probably be a WAY better idea to use the Ø verbatim in any case.
Williham Totland
A: 

ok, got a bit further, if I user var_dump instead of echo I get this:

object(SimpleXMLElement)[22]
  public 'symbol' => 
  object(SimpleXMLElement)[21]
  public '@attributes' => 
    array
      'name' => string 'Oslash' (length=6)
      'unicode' => string '00D8' (length=4)
      'type' => string 'html' (length=4)
      'glyph' => string '@Oslash;' (length=8)
      'description' => string 'capital O, slash' (length=16)
      'ascii' => string 'O' (length=1)
  string ' ' (length=1)

I wonder how I can use that to make a complete string together with the contents of forenames

Maarten
+2  A: 

Looking at the DTD, it says this (but without line breaks):

<!ENTITY Oslash 
    "<symbol name='Oslash' unicode='00D8'
     type='html' glyph='@Oslash;' description='capital O, slash' 
     ascii='O' > </symbol>"
>

To any XML reader using this DTD, this means "Whenever you see this exact combination of letters in the source: &Oslash;, replace it with this text: <symbol name='Oslash' unicode... > </symbol>

This means that the XML data actually reads like this:

<forenames>NIELS B<symbol name='Oslash' unicode='00D8'
     type='html' glyph='@Oslash;' description='capital O, slash' 
     ascii='O' > </symbol>IE</forenames>

...which explains why it's not showing up in your browser. The way around it would be to search your XML document for all <symbol> elements, read the unicode parameter and replace them with that.


Looking further at it, the comments at the top of the DTD show they've considered people in your situation! The glyph attribute on the <symbol> tag is the standard HTML entity to use for that symbol, but with the ampersand replaced with an @.

10 read xml document
20 search for any <symbol> element
30 read the "glyph" attribute
40 remove the <symbol> element
50 replace the @ with an & in glyph
60 write that in the place of <symbol>
70 goto 20
nickf
I'd still expect my browser to show symbol then if I view the source, but it might be that because it's within the element I have to parse it differently for it to show up..I'll experiment.
Maarten
but there is no Ø symbol... `Ø` is being replaced with `<symbol ...>` when your XML reader parses it.
nickf
yeah I get that. I was just thinking the <symbol..> part would show up in view source, but as it is an element within the forenames one the way I was looking at the xml I think it is now shown. I need to look at nested symbol tags too...
Maarten
nope, just the same as how when you look at the source of a HTML document, you still see `é` - the actual source code isn't changed, it's only when it's parsed into a DOM tree. Also, the symbols will never be nested within each other, and using an XPath query of `"//symbol"` will find them all.
nickf
ok. Getting closer I think. The XPath is finding the symbols, now I need to figure out how to replace a node with some other content...
Maarten
`$textNode = document.createTextNode($unicodeChar); $symbol->parentNode->insertBefore($textNode, $symbol); $symbol->parentNode->removeChild($symbol);`
nickf