The xml file has this snippet:
<?xml version="1.0"?>
<PC-AssayContainer
xmlns="http://www.ncbi.nlm.nih.gov"
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xs:schemaLocation="http://www.ncbi.nlm.nih.gov ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem.xsd"
>
....
<PC-AnnotatedXRef>
<PC-AnnotatedXRef_xref>
<PC-XRefData>
<PC-XRefData_pmid>17959251</PC-XRefData_pmid>
</PC-XRefData>
</PC-AnnotatedXRef_xref>
</PC-AnnotatedXRef>
I tried to parse it using xpath's global search and also tried with some namespacing:
library('XML')
doc = xmlInternalTreeParse('http://s3.amazonaws.com/tommy_chheng/pubmed/485270.descr.xml')
>pathApply(doc, "//PC-XRefData_pmid")
list()
attr(,"class")
[1] "XMLNodeSet"
> getNodeSet(doc, "//PC-XRefData_pmid")
list()
attr(,"class")
[1] "XMLNodeSet"
> xpathApply(doc, "//xs:PC-XRefData_pmid", ns="xs")
list()
> xpathApply(doc, "//xs:PC-XRefData_pmid", ns= c(xs = "http://www.w3.org/2001/XMLSchema-instance"))
list()
Shouldn't the xpath match:
<PC-XRefData_pmid>17959251</PC-XRefData_pmid>