tags:

views:

88

answers:

2

hi all,

I have a XML file as follow:

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2010//EN" "http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_100101.dtd"&gt;
<PubmedArticleSet>
<PubmedArticle>
    <MedlineCitation Owner="NLM" Status="Publisher">
        <PMID>20555148</PMID>
        <DateCreated>
            <Year>2010</Year>
            <Month>6</Month>
            <Day>17</Day>
         </DateCreated>
        <Article PubModel="Print-Electronic">
        <Journal>
            <ISSN IssnType="Electronic">1875-8908</ISSN>
            <JournalIssue CitedMedium="Internet">
                <PubDate>
                    <Year>2010</Year>
                    <Month>Jun</Month>
                    <Day>16</Day>
                </PubDate>
            </JournalIssue>
            <Title>Journal of Alzheimer's disease : JAD</Title>
        </Journal>
        <ArticleTitle>CSF Neurofilament Proteins Levels are Elevated in Sporadic Creutzfeldt-Jakob Disease.</ArticleTitle>
        <Pagination>
            <MedlinePgn/>
        </Pagination>
        <Abstract>
            <AbstractText>In this study we investigated the cerebrospinal fluid (CSF) levels of neurofilament light (NFL) and heavy chain (NFHp35), total tau (t-tau), and glial fibrillary acidic protein (GFAP) to detect disease specific profiles in sporadic Creutzfeldt Jakob disease (sCJD) patients and Alzheimer's disease (AD) patients. CSF levels of NFL, NFHp35, t-tau, and GFAP of 23 sCJD patients and 55 AD patients were analyzed and compared to non-demented controls. Median NFL, NFHp35, GFAP, and t-tau levels were significantly increased in sCJD patients and AD patients versus controls (p &lt; 0.0001 in all). NFL, NFHp35, and t-tau levels were significantly increased in sCJD patients versus AD patients (p &lt; 0.005), but GFAP concentrations did not differ between sCJD and AD. The results suggest that neuroaxonal damage, reflected by higher CSF levels of NFL, NFHp35, and t-tau, is more pronounced in the pathophysiology of sCJD than in AD. The comparable CSF GFAP concentrations suggest that astroglial damage or astrocytosis is equally pronounced in the pathophysiology of AD and sCJD. Prospective studies are needed to determine whether NFL and NFHp35 may be additional tools in the differential diagnosis of rapidly progressive dementias.</AbstractText>
        </Abstract>
        <Affiliation>Department of Neurology, Radboud University Nijmegen Medical Centre, Donders Institute for Brain, Cognition and Behaviour, Alzheimer Centre Nijmegen, The Netherlands.</Affiliation>
        <AuthorList>
            <Author>
                <LastName>van Eijk</LastName>
                <ForeName>Jeroen J J</ForeName>
                <Initials>JJ</Initials>
            </Author>
            <Author>
                <LastName>van Everbroeck</LastName>
                <ForeName>Bart</ForeName>
                <Initials>B</Initials>
            </Author>
            <Author>
                <LastName>Abdo</LastName>
                <ForeName>W Farid</ForeName>
                <Initials>WF</Initials>
            </Author>
            <Author>
                <LastName>Kremer</LastName>
                <ForeName>Berry P H</ForeName>
                <Initials>BP</Initials>
            </Author>
            <Author>
                <LastName>Verbeek</LastName>
                <ForeName>Marcel M</ForeName>
                <Initials>MM</Initials>
            </Author>
        </AuthorList>
        <Language>ENG</Language>
        <PublicationTypeList>
            <PublicationType>JOURNAL ARTICLE</PublicationType>
        </PublicationTypeList>
        <ArticleDate DateType="Electronic">
            <Year>2010</Year>
            <Month>6</Month>
            <Day>16</Day>
        </ArticleDate>
    </Article>
    <MedlineJournalInfo>
        <MedlineTA>J Alzheimers Dis</MedlineTA>
        <NlmUniqueID>9814863</NlmUniqueID>
        <ISSNLinking>1387-2877</ISSNLinking>
    </MedlineJournalInfo>
</MedlineCitation>
<PubmedData>
    <History>
        <PubMedPubDate PubStatus="entrez">
            <Year>2010</Year>
            <Month>6</Month>
            <Day>18</Day>
            <Hour>6</Hour>
            <Minute>0</Minute>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="pubmed">
            <Year>2010</Year>
            <Month>6</Month>
            <Day>18</Day>
            <Hour>6</Hour>
            <Minute>0</Minute>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="medline">
            <Year>2010</Year>
            <Month>6</Month>
            <Day>18</Day>
            <Hour>6</Hour>
            <Minute>0</Minute>
        </PubMedPubDate>
    </History>
    <PublicationStatus>aheadofprint</PublicationStatus>
    <ArticleIdList>
        <ArticleId IdType="pii">720R60380216K661</ArticleId>
        <ArticleId IdType="doi">10.3233/JAD-2010-090649</ArticleId>
        <ArticleId IdType="pubmed">20555148</ArticleId>
    </ArticleIdList>
</PubmedData>

How do I extract the AbstractText using Perl? Thx.

+2  A: 

Use an XML parser library. For small stuff, you can use XML::Simple. For very big files, XML::Twig or XML::Parser

Example using XML::Simple

use XML::Simple; 
my $xml = XMLin("~/junk/a.xml"); 
my $AbstractText = $xml->{PubmedArticle}->{MedlineCitation}->{Article}->{Abstract}->{AbstractText};
DVK
Unfortunately, I used similar code but it did not work.
thchew
@thchew: The closing nodes `</PubmedArticle>` and `</PubmedArticleSet>` (root node) are missing in your example. This gave me warnings with XML::Twig and perhaps was more fatal for XML::Simple?
draegtun
@thchew - I actually ran the code above using `perl -e` and it worked. Please provide exact code and exact XML and exact error
DVK
Ok, my bad. I use the exact same code as above but it did not work, with or without the "-e" switch. but when I change the 3rd line tomy $AbstractText = $xml->{'PudmedArticle'}->{'MedlineCitation'}->{'Article'}->{'Abstract'}, it works.
thchew
+3  A: 

Here is a quick and dirty example using XML::Twig.

use 5.012;
use warnings;
use XML::Twig;

XML::Twig->new(
    twig_handlers => {
        AbstractText => sub { say $_->text },
    },
)->parsefile( 'your_data.xml' );

/I3az/

draegtun
Thanks, this helps me a lot. Now, I get a rough idea on how to process my xml file.
thchew
Your welcome. Its always best to show a man howto fish :) If you actually needed a nested select on that AbstractText then the line in XML::Twig would have been ` 'PubmedArticle/MedlineCitation/Article/Abstract/AbstractText' => sub { say $_->text },`
draegtun