views:

487

answers:

2

The title is more complicated than it has to be, here's the problem query.

SELECT * 
FROM query.multi 
WHERE queries="
    SELECT * 
        FROM html 
        WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com' 
        AND xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span';
    SELECT * 
        FROM xml 
        WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com';
    SELECT * 
        FROM json 
        WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com';
    SELECT * 
        FROM xml 
        WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com';
    SELECT * 
        FROM json 
        WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'"

Specifically this line,

xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span'

It's problematic because of the quoting, I have to nest them three levels deep and I've run out of quote characters to use. I've tried the following variations without success:

//no attribute quoting
xpath='//li[@class=listLi]/div[@class=views]/a/span' 

//try to quote attribute w/ backslash & single quote
xpath='//li[@class=\'listLi\']/div[@class=\'views\']/a/span'

//try to quote attribute w/ backslash & double quote
xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span'

//try to quote attribute with double single quotes, like SQL
xpath='//li[@class=''listLi'']/div[@class=''views'']/a/span'

//try to quote attribute with double double quotes, like SQL
xpath='//li[@class=""listLi""]/div[@class=""views""]/a/span'

//try to quote attribute with quote entities
xpath='//li[@class="listLi"]/div[@class="views"]/a/span'

//try to surround XPath with backslash & double quote
xpath=\"//li[@class='listLi']/div[@class='views']/a/span\"

//try to surround XPath with double double quote
xpath=""//li[@class='listLi']/div[@class='views']/a/span""

All without success.

I don't see much out there about escaping XPath strings but everything I've found seems to be variations on using concat (which won't help because neither ' nor " are available) or html entities. Not using quotes for the attributes doesn't throw an error but fails because it's not the actual XPath string I need.

I don't see anything in the YQL docs about how to handle escaping. I'm aware of how edge-casey this is but was hoping they'd have some sort of escaping guide.

A: 

I've come up with a solution that doesn't really answer my original question but does solve the problem.

The data.html.cssselect table will take a CSS selector & parse it into an XPath, avoiding the nasty escaping issues.

SELECT *
FROM query.multi 
    WHERE queries="
        SELECT * 
            FROM data.html.cssselect 
            WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com' 
            AND css='li.listLi div.views a span';
        SELECT * 
            FROM xml 
            WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com';
        SELECT * 
            FROM json 
            WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com';
        SELECT * 
            FROM xml 
            WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com';
        SELECT * 
            FROM json 
            WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'"
Tivac
+2  A: 

You need to escape whatever character is delimiting your XPath query with a double backslash... in other words:

SELECT * FROM query.multi 
WHERE queries="
    SELECT * 
        FROM html 
        WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com' 
        AND xpath='//li[@class=\\'listLi\\']/div[@class=\\'views\\']/a/span';
    SELECT * 
        FROM xml 
        WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com';
    SELECT * 
        FROM json 
        WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com';
    SELECT * 
        FROM xml 
        WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com';
    SELECT * 
        FROM json 
        WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'"

(try this in the YQL console)

salathe
By default JS is eating the \\ when I try to use this in my page. Had to do this nonsense to make it work,Ah, got it. Pretty janky though. "SELECT * FROM html WHERE url='stumbleupon.com/url/%url%' AND xpath='//li[@class=\\" + "\\'listLi\\" + "\\']/div[@class=\\" + "\\'views\\" + "\\']/a/span'"Weirdly, it looks like data.html.cssselect is faster than select from html using xpath even though data.htmlcssselect just transforms into a select from html w/ xpath. Odd.
Tivac