ansaurus

Question

SQL for the web

Answer 1

A:

I'm not sure whether this is exactly what you're looking for, but Freebase is an open database of information with a programmatic query interface.

Greg Hewgill 2008-10-18 22:49:11

Answer 2

+1 A:

You are probably looking for SPARQL. It doesn't let you parse pages, but it's designed to solve the same problems (i.e. getting data out of a site -- from the cloud). It's a W3C standard, but Microsoft, apparently, does not support it yet, unfortunately.

Sklivvz 2008-10-18 23:24:29

Answer 3

+3 A:

See hpricot (a Ruby library).

# load the RedHanded home page
doc = Hpricot(open("http://redhanded.hobix.com/index.html"))
# change the CSS class on links
(doc/"span.entryPermalink").set("class", "newLinks")
# remove the sidebar
(doc/"#sidebar").remove
# print the altered HTML
puts doc

It supports querying with CSS or XPath selectors.

Pistos 2008-10-19 00:40:44

Answer 4

+2 A:

Beautiful Soup and hpricot are the canonical versions, for Python and Ruby respectively.

For C#, I have used and appreciated HTML Agility Pack. It does an excellent job of turning messy, invalid HTML in queryable goodness.

There is also this C# html parser which looks good but I've not tried it.

Colin Pickard 2008-10-20 13:14:43

ansaurus

tags:

views:

answers:

SQL for the web

related questions