views:

222

answers:

3

I have an XML document that references a namespace that is no available:

<microplateDoc xmlns="http://moleculardevices.com/microplateML"&gt;
...my data is here...
</microplateDoc>

I have a script that reads it fine, but only when I delete the two above tags, otherwise it reads it all screwed up. Is it ok just to ignore it? I'm thinking of a writing another script to go through all of my input files and deleting these two lines, but I think there may be a better way?

If I did go through all my datafiles and deleted these two lines, what is the best way to do it with a script? I presume just open each file, search for those terms, delete them, save file, can you think of a better way? thanks.

+1  A: 

Regarding removing lines from a file, this exact question was asked earlier today. (Just add -d to the sed options to delete the matching line.)

Ether
Thanks for the link Ether, I'm still keen to hear from others if there is a way to tell the parser to ignore it. This would be even easier than removing it.
John
My xml-fu is poor at the moment, so you're at the mercy of everyone else here I'm afraid :)
Ether
+3  A: 

I have an XML document that references a namespace that is no available:

I suspect you're confused about what an XML namespace is. A namespace is a Uniform Resource Identifier, which is to say a string of characters that conforms to RFC 3305. It's not (necessarily) a Uniform Resource Locator, though it can be, as URLs are all URIs.

The important thing is: Just because an XML namespace begins with http:// doesn't mean that the XML parser is going to look it up. It won't (unless the person who wrote it doesn't understand what namespaces are, in which case you're going to have a lot more problems than this).

It's impossible to tell what you mean when you say that the script reading this XML document "reads it all screwed up." Is it OK to ignore it? It may very well be. Part of the purpose of namespaces, after all, is to make it possible to embed information in an XML document that some consumers of that document can ignore.

On the other hand, if you're not the only one who uses those files, you could be making big trouble for yourself by deleting data that someone else needs.

Robert Rossney
+1  A: 

I don't think there's anything wrong with your namespace there, and I wouldn't go messing with the input files unless you're confident there won't be any unwelcome side-effects. What I think it happening is a common beginner XML-processing mistake: namespaces need to be registered (i.e. bound to a prefix) in your code before you can access the nodes in that namespace.

http://perl-xml.sourceforge.net/faq/#namespaces_xpath looks like a useful example. I don't generally work with Perl, but I've seen this happen in a bunch of other languages.

hcayless
Many thanks, you were correct, I had failed to register the name space. I ha incorrectly thought that namespaces were lists of variable names located at a particular address.I added these two lines from the website you pointed to:my $xpc = XML::LibXML::XPathContext->new($tree); $xpc->registerNs(microplateML => 'http://moleculardevices.com/microplateML'); Now I can access elements with something like this: foreach my $camelid ($xpc->findnodes('//microplateML:species')) {It's still not working 100%, but this was the problem, no need to delete the line after all. Thanks again.
John