views:

245

answers:

2

Given:

  • Url - http://www.contoso.com/search.php?q={param} returns:

    -html-
    --body-
    {...}
    ---div id='foo'-
    ----div id='page1'/-
    ----div id='page2'/-
    ----div id='page3'/-
    ----div id='pageN'/-
    ---/div-
    {...}
    --/body-
    -/html-

Wanted:

  • The innerHtml of div id='foo' must be fetched by the client (i.e. Javascript).
    • It will be split into discrete items (i.e. div id='page1' to div id='pageN').
  • API Throttling prevents server-side code from pre-fetching the data, so the parsing and manipulation burden must be placed on the client.

Question:

  • Could Yahoo-Pipes help format the data for easier consumption?
    • The lack of a DOM parser gives me pause.
  • Are there any existing pipes that could serve as an example?
+1  A: 

Yes, it's doable with Y! Pipes. You only need two modules from the 'Operators section':

First "Sub Element" to get only the content.

Then just use the "Regex" module to extract the div content and get it through JSON from your site:

Search:

^.*?<div id="foo">(.*?)</div>.*?$

Replace:

$1

scribu
+2  A: 

You can use the YQL module, which allows you to fetch arbitrary URLs and then parse them with XPath. A sample YQL query:

select * from html where url="http://finance.yahoo.com/q?s=yhoo" and
  xpath='//div[@id="yfi_headlines"]/div[2]/ul/li/a'
Mauricio Scheffer