ansaurus

Question

Is there a way to get YQL to return HTML?

Answer 1

A:

YQL converts the page into XML, then does your XPath on it, then takes the DOMNodeList and serializes that back to XML for your output (and then converts to JSON if needed). You can't access the original data.

Why can't you deal with XML instead of HTML?

Paul Tarjan 2010-04-04 01:50:12

I'm using this in the context of Yahoo Pipes, so I want to insert the HTML into an RSS feed to be rendered by a feed reader/browser. Inserting the XML might work, but the Pipes YQL module seems to just insert the DOM elements into the document; I don't see a way to get the XML source either.

Joe Shaw 2010-04-04 12:12:59

Answer 2

+1 A:

I had this same exact problem. The only way I have gotten around it is to avoid YQL and just use regular expressions to match the start and end tags :/. Not the best solution, but if the html is relatively unchanging, and the pattern just from say <div class='name'> to <div class='just_after>`, then you can get away with that. Then you can get the html between.

viatropos 2010-05-04 20:40:25

Yeah, this is what I ended up doing too. Unfortunately the structure of the page changes depending on what type of entry it is, so I end up having to split the feed several times to handle all the different types and merge/sort them back together. A real pain, but it works.

Joe Shaw 2010-05-05 13:22:31

Answer 3

+3 A:

You could write a little Open Data Table to send out a normal YQL html table query and stringify the result. Something like the following:

<?xml version="1.0" encoding="UTF-8" ?>
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd"&gt;
  <meta>
    <sampleQuery>select * from {table} where url="http://finance.yahoo.com/q?s=yhoo" and xpath='//div[@id="yfi_headlines"]/div[2]/ul/li/a'</sampleQuery>
    <description>Retrieve HTML document fragments</description>
    <author>Peter Cowburn</author>
  </meta>
  <bindings>
    <select itemPath="result.html" produces="JSON">
      <inputs>
        <key id="url" type="xs:string" paramType="variable" required="true"/>
        <key id="xpath" type="xs:string" paramType="variable" required="true"/>
      </inputs>
      <execute><![CDATA[
var results = y.query("select * from html where url=@url and xpath=@xpath", {url:url, xpath:xpath}).results.*;
var html_strings = [];
for each (var item in results) html_strings.push(item.toXMLString());
response.object = {html: html_strings};
]]></execute>
    </select>
  </bindings>
</table>

You could then query against that custom table with a YQL query like:

use "http://url.to/your/datatable.xml" as html.tostring;
select * from html.tostring where 
  url="http://finance.yahoo.com/q?s=yhoo" 
  and xpath='//div[@id="yfi_headlines"]/div[2]/ul/li'

Edit: Just realised this is a pretty old question that was bumped; at least an answer is here, eventually, for anyone stumbling on the question. :)

salathe 2010-05-04 21:48:09

Beautiful! Thank you. The only issue I have now is how to get a Yahoo Pipes variable into the YQL expression. For example, select * from html.tostring where url=item.link and xpath='//div[@id="foo"]'gives back the error "Invalid identfier item.link. me is the only supported identifier in this context." Any ideas how I do that?(Sorry for the butchered code snippet, looks like comments don't allow much in the way of formatting)

Joe Shaw 2010-05-05 13:53:10

Figured out the answer to this: create a separate pipe which takes a URL input, inserts that into a string builder which builds the YQL query, and attach that as the query to the YQL widget. Then in your main pipe, use this new pipe and pass in the URL as the input to it.I think I'll probably open a new question for this specifically so people don't have to hunt it down in the comments of this one.

Joe Shaw 2010-05-22 19:00:02

Opened: http://stackoverflow.com/questions/2889406/how-do-i-pass-a-yahoo-pipes-item-into-a-yql-query

Joe Shaw 2010-05-22 19:38:44

ansaurus

tags:

views:

answers:

Is there a way to get YQL to return HTML?

related questions