views:

78

answers:

5

Hi,

I am having to use the following XML returned from a web service:

<?xml version="1.0" encoding="utf-8" ?>
<root>
  <staticPage>
    <liStaticPageID>6165</liStaticPageID>
    <sTitle>Ethylene</sTitle>
    <sPageURL>Ethylene.htm</sPageURL>
    <sBody>
      <P>Ethylene is a colourless, odourless, extremely flammable compressed gas. It is slightly soluble in water and soluble in liquid hydrocarbons. It reacts with strong oxidants causing fire and explosion hazard. It may polymerise to form aromatic compounds under the influence of temperatures above 600°C.</P>
      <BR>
        <P>Around 59% of the world’s ethylene demand is consumed in polyethylene production. Other major derivatives are ethylene oxide/glycol (13%), ethylene dichloride/vinyl chloride monomer (13%) and ethyl benzene/styrene (6%), with other uses such as acetaldehyde, alpha-olefins, ethylene-propylene elastomers and vinyl acetate representing around 9% of demand. </P>
        <BR>
          <P>Although ethylene gas poses no risk to skin or eyes, the ethylene liquid can cause frostbite. Ethylene is a dangerous fire and explosion hazard. Exposure to ethylene occurs through inhalation, from leaks, spills, accidents, and cigarette smoke. While ethylene gas is invaluable due to its ability to initiate the ripening process in several fruits, it can also be very harmful to many fruits, vegetables, flowers and plants by accelerating the ageing process and decreasing the product quality and shelf life.</P>
          <BR>
            <P>ICIS pricing quotes ethylene in Europe, Asia-Pacific and the US Gulf. </P>
            <BR>
              <P>Frequency:</P>
              <BR>
                <P>Published weekly on Fridays and an Ethylene Daily (Asia) report is published Mondays-Fridays.</P>
                <P>Real time Price Alert Service (PAS) delivering market news and trends throughout the day. </P>
                <BR>
                  <P>Ethylene (EUROPE)</P>
                  <BR>
                    <P>Weekly Price Assessments:</P>
                    <BR>
                      <P>Ethylene Contract Prices</P>
                      <BR>
                        <P>FD NWE quarterly (EUR/MT &amp; conversion to US CTS/LB) </P>
                        <P>FD NWE monthly (EUR/MT &amp; conversion to US CTS/LB)</P>
                        <BR>
                          <P>Ethylene Spot Prices</P>
                          <BR>
                            <P>FD NWE PIPELINE (EUR/MT &amp; conversion to US CTS/LB) </P>
                            <P>CIF NWE (EUR/MT &amp; conversion to US CTS/LB) </P>
                            <P>CIF MED (EUR/MT &amp; conversion to US CTS/LB)</P>
                            <BR>
                              <P>Feedstock – Naphtha Spot Prices </P>
                              <BR>
                                <P>CIF NWE (USD/MT)</P>
                                <BR>
                                  <P>Ethylene (ASIA-PACIFIC)</P>
                                  <BR>
                                    <BR>
                                      <BR>
                                        <P>Daily and Weekly Price Assessments:</P>
                                        <BR>
                                          <P>Ethylene Daily Spot Prices</P>
                                          <BR>
                                            <P>CFR N.E.Asia (USD/MT &amp; conversion to US CTS/LB</P>
                                            <P>CFR S.E.Asia (USD/MT &amp; conversion to US CTS/LB)</P>
                                            <BR>
                                              <P>Weekly Price Assessments:</P>
                                              <BR>
                                                <P>FOB KOREA (USD/MT &amp; conversion to US CTS/LB) </P>
                                                <P>CFR N.E.ASIA (USD/MT &amp; conversion to US CTS/LB) </P>
                                                <P>CFF S.E.ASIA (USD/MT &amp; conversion to US CTS/LB)</P>
                                                <P>Feedstock – Naphtha Spot Prices </P>
                                                <BR>
                                                  <P>CFR Japan (USD/MT)</P>
                                                  <BR>
                                                    <P>Ethylene (US GULF)</P>
                                                    <BR>
                                                      <P>Weekly Price Assessments:</P>
                                                      <BR>
                                                        <P>Ethylene Net Contract Prices (FD):</P>
                                                        <BR>
                                                          <P>Pipeline monthly (US CTS/LB &amp; conversion to USD/MT)</P>
                                                          <BR>
                                                            <P>Ethylene Spot Prices (FD)</P>
                                                            <BR>
                                                              <P>Pipeline weekly (US CTS/LB &amp; conversion to USD/MT)</P>
                                                              <BR>
                                                                <P>Feedstock – Naphtha Spot Prices </P>
                                                                <BR>
                                                                  <P>DEL USG PARAFFINIC (USD/MT)</P>
                                                                  <BR>
                                                                    <P>General Information:</P>
                                                                    <BR>
                                                                      <P>Assessment window: Price assessments are based on information supplied by market participants through the week up to close of business on Fridays at 1800 hours in London, Singapore and Houston.</P>
                                                                      <BR>
                                                                        <P>Daily assessments are based on information gathered throughout the day up to the close of business at 1730 hours in Singapore.</P>
                                                                        <BR>
                                                                          <P>Specifications: Price quotes are provided on the basis of product of 99.9% purity. The European FD PIPELINE quote is for ARG specification. </P>
                                                                          <BR>
                                                                            <P>Timing: In Asia and Europe, business is usually concluded within a six week forward delivery window from date of publication. However, given arbitrage movements, a maximum forward delivery window of 60 days applies for the quotations. In the US, contract prices are tied to the delivery month referenced next to the price. US spot prices are quoted for one-to-two weeks out.</P>
                                                                            <BR>
                                                                              <P>Terms: 30-90 days after bill of lading date.</P>
                                                                              <BR>
                                                                                <P>Standard cargo size: Typical cargo sizes in Asia range from 2,300 to 3,000 tonnes while product from the Middle East ranges from 4,000-5,000 tonnes. Typical European cargo sizes range between 2,000 and 5,000 tonnes. US domestic deliveries are typically sold in 5-10 million lb parcels via pipeline. Imported cargo sizes can be up to 4,000-5,000 tonnes.</P>
                                                                                <BR>
                                                                                  <P>Assessment basis: ICIS pricing ethylene price assessments are based on information gathered throughout the week from producers, traders, end-consumers and the shipping market. The assessment takes into consideration: confirmed deals, reported deals, firm offers and bids, buy and sell indications, and rumoured deals. </P>
                                                                                  <BR>
                                                                                    <P>All efforts are made to confirm pricing levels with the respective buyer and seller before price assessments are adjusted. In the absence of confirmation and/or trades, price ranges may be adjusted at the discretion of the editor on a notional basis to better reflect levels at which trading activity could take place. Consideration is also given to all factors potentially influencing the price of ethylene at any given time, including supply/demand information; feedstock prices and derivative market prices.</P>
                                                                                    <BR>
                                                                                      <P>In Europe, contract prices are fixed on both a quarterly and a monthly basis. Monthly quotes were first introduced in January 2009. The bi-monthly contract quote was discontinued in Q4 2008. Contracts are negotiated between producers and consumers. </P>
                                                                                      <BR>
                                                                                        <P>It is understood that ICIS pricing price assessments are often used as a benchmark for spot/contract ethylene trades, on an ICIS pricing average +/- alpha basis. As a result special emphasis is given to ensuring that the ICIS pricing average spot price is a number that can be readily agreed upon by as wide a cross-section of the market place as possible. </P>
                                                                                        <BR>
                                                                                          <P>Netback calculations (i.e CFR prices derived from FOB numbers + freight) are not usually considered sufficient to warrant an automatic adjustment of CFR assessments on the basis of open market freights. The use of COA vessels in Asia-Pacific and the need for employment can lead to below-market freight components to apply. ICIS pricing prefers to adjust assessments on a like-for-like basis, CFR for CFR, FOB for FOB. Similarly southeast Asian price assessments are not adjusted on northeast Asian prices + freight component, or vice versa. </P>
                                                                                          <BR>
                                                                                            <P>
                                                                                              The Asia-Pacific report focuses on the regional spot market, however information on domestic contract pricing and prevailing formulae is carried in the text, where details are available. Northeast Asia comprises <?xml:namespace prefix = st1 /><st1:country-region w:st="on">Japan</st1:country-region>, <st1:country-region w:st="on">Korea</st1:country-region>, <st1:country-region w:st="on">Taiwan</st1:country-region> and <st1:country-region w:st="on">China</st1:country-region>, while southeast Asia comprises the <st1:country-region w:st="on">Philippines</st1:country-region>, <st1:country-region w:st="on">Thailand</st1:country-region>, <st1:country-region w:st="on">Malaysia</st1:country-region>, <st1:country-region w:st="on">Singapore</st1:country-region> and <st1:country-region w:st="on"><st1:place w:st="on">Indonesia</st1:place></st1:country-region>. </P><BR> <P>The Ethylene Daily (Asia) report covers spot deals on a CFR N.E.Asia and CFR S.E.Asia basis. The assessment takes into account deals, bids and offers and price ideas heard throughout the day. It also includes cracker production updates.</P><BR> <P>In the <st1:country-region w:st="on"><st1:place w:st="on">US</st1:place></st1:country-region>, the net contract price usually settles at the end of the month listed. US ethylene spot prices are on a free-delivered (FD) basis and represent confirmed business, bid/offer levels or general sentiment.</P><BR><BR></sBody> 
  <liNavigationItemID>1</liNavigationItemID> 
  <uiEnteredByID /> 
  <sEnteredBy /> 
  <dtEntered>07/03/2007 08:51:13</dtEntered> 
  <uiLastModifiedByID>641d1389-710f-42c6-8c10-38a2105f5149</uiLastModifiedByID> 
  <sLastModifiedBy>Barbara Ortner</sLastModifiedBy> 
  <dtLastModified>21/07/2009 16:06:17</dtLastModified> 
  <dtApproved>21/07/2009 16:06:20</dtApproved> 
  <uiApprovedByID>641d1389-710f-42c6-8c10-38a2105f5149</uiApprovedByID> 
  <sApprovedBy>Barbara Ortner</sApprovedBy> 
  <bLive>1</bLive> 
  <liVersionNo>11</liVersionNo> 
  <sMetaDescription /> 
  <sMetaKeywords /> 
  <sPageTitle>Ethylene Methodology ICIS pricing</sPageTitle> 
  </staticPage>
  </root>

However the jQuery AJAX call fails because the XML document is not well formed.

Being new to XML i don't know how to process the XML doc prior to making the AJAX call to make it well formed. I have manually edited it and managed to retrieve data to the page but obvioulsy this needs to be automated.

+6  A: 

You should contact the creators of the web service and tell them that they serve something that is not valid XML although it is noted as being valid.

kazanaki
+4  A: 

The only way you could even remotely get this to work could be this:

  • grab the output as a string
  • using parsing or regexes, extract everything between <sBody> and </sBody>
  • put that blob of text back in between the <sBody> tag inside a <![CDATA[........]]]]> section

That way, you can at least parse that mess into a valid XML - can't do much with the contents inside the CDATA, I'm afraid.....

marc_s
You're overly pessimistic. Quite a few tools will turn almost-XML into XML.
reinierpost
+2  A: 

I saved your sample to the file /tmp/nwf.xml and ran

xmllint /tmp/xml

This returns a nice list of errors. One possible approach is to filter their output through a couple of regexp-based substitutions until the result is valid XML (checked again by running it through xmllint), and then proceed to process that with regular XML processing.

I also ran this:

xmllint -html /tmp.xml

It accepts the result, returning a valid XML document. So the second approach is to filter the text through xmllint -html instead. (To this end you don't necessarily need to call the xmllint command line tool; it is based on libxml2, which many programming languages have bindings to, although I doubt that JavaScript is among them, but you can write your own server-side filter to do this and call it with AJAX.)

BTW the other repliers are right: it really shouldn't be your job to fix this.

reinierpost
A: 

Just want to add, I use Solaris and Linux but xmllint is only available (to me at least) on Linux. It's a neat tool though and you can simply do this as a pre run check or to prove to the provider that it's badly formed.

xmllint filename --noout

This will return errors only which makes things easier. E.g. I got errors like this

101: warning: xmlParsePITarget: invalid name prefix 'xml' the text, where details are available. Northeast Asia comprises < ?xml:namespace ...

DeltaRogue
I use xmllint on Cygwin as well. It's not difficult to compile, so it should be easy to do so on Solaris. I usually add the -format flag which makes it easier to spot strange things in the content.
reinierpost
I wish I had that freedom but we are not *allowed* due to firm restrictions. :)
DeltaRogue
When people prevent you from doing your job, Stack Overflow is not the place to turn to.
reinierpost
A: 

As a kluge, you could read the file as a text document and use a string library routine to replace <BR> with <BR/>. The problem is that you are recieving legacy HTML from the provider, and some of the old legacy codes, like <BR> are not valid in XML, since codes must be paired or expressed as <BR/> which is the short form or <BR></BR>.

BWilliams