views:

337

answers:

4

I'd love to do this:

UPDATE table SET blobCol = HTTPGET(urlCol) WHERE whatever LIMIT n;

Is there code available to do this? I known this should be possible as the MySQL Docs include an example of adding a function that does a DNS lookup.

MySQL / windows / Preferably without having to compile stuff, but I can.

(If you haven't heard of anything like this but you would expect that you would have if it did exist, A "proly not" would be nice.)

EDIT: I known this would open a whole can-o-worms re security, however in my cases, the only access to the DB is via the mysql console app. Its is not a world accessible system. It is not a web back end. It is only a local data logging system

+1  A: 

I don't know of any function like that as part of MySQL. Are you just trying to retreive HTML data from many URLs?

An alternative solution might be to use Google spreadsheet's importHtml function.

Google Spreadsheets Lets You Import Online Data

andyuk
Yes I am. However the data is binary so for that reason (and others) googel won't work.
BCS
For what it does however, that looks like a cool feature idea.
BCS
+4  A: 

No, thank goodness — it would be a security horror. Every SQL injection hole in an application could be leveraged to start spamming connections to attack other sites.

You could, I suppose, write it in C and compile it as a UDF. But I don't think it really gets you anything in comparison to just SELECTing in your application layer and looping over the results doing HTTP GETs and UPDATEing. If we're talking about making HTTP connections, the extra efficiency of doing it in the database layer will be completely dwarfed by the network delays anyway.

bobince
The issue is that the select/loop/http get app you propose is the only extra-DB code I have in the project (BTW that is what I'm doing right now). Without it, the project would be down to a single component, the DB its self.
BCS
I see... and the custom UDF wouldn't really be an improvement either, as it would itself be an extra component, and one considerably more annoying to deploy than whatever language/tool you're using now.
bobince
As for other DBMSs that might be able to do it, Oracle has UTL_HTTP and you could probably do it from SQL Server via a WinHttp object. Again though, installing either of those is not entirely convenient!
bobince
+1  A: 

Proly not. Best practises in a web-enviroment is to have database-servers isolated from the outside, both ways, meaning that the db-server wouldn't be allowed to fetch stuff from the internet.

jishi
a good point, however n/a in my cases, as it is not a web-environment any more than the average desktop computer is (see edit)
BCS
A: 

Proly not.

If you're absolutely determined to get web content from within an SQL environ, there are as far as I know two possibilities:

  1. Write a custom MySQL UDF in C (as bobince mentioned). The could potentially be a huge job, depending on your experience of C, how much security you want, how complete you want the UDF to be: eg. Just GET requests? How about POST? HEAD? etc.

  2. Use a different database which can do this. If you're happy with SQL you could probably do this with PostgreSQL and one of the snap-in languages such as Python or PHP.

If you're not too fussed about sticking with SQL you could use something like eXist. You can do this type of thing relatively easily with XQuery, and would benefit from being able to easily modify the results to fit your schema (rather than just lumping it into a blob field) or store the page "as is" as an xhtml doc in the DB.

Then you can run queries very quickly across all documents to, for instance, get all the links or quotes or whatever. You could even apply XSL to such a result with very little extra work. Great if you're storing the pages for reference and want to adapt the results into a personal "intranet"-style app.

Also since eXist is document-centric it has lots of great methods for fuzzy-text searching, near-word searching, and has a great full-text index (much better than MySQL's). Perfect if you're after doing some data-mining on the content, eg: find me all documents where a word like "burger" within 50 words of "hotdog" where the word isn't in a UL list. Try doing that native in MySQL!

As an aside, and with no malice intended; I often wonder why eXist is over-looked when people build CMSs. Its a database that can store content in its native format (XML, or its subset (x)HTML), query it with ease in its native format, and can translate it from its native format with a powerful templating language which looks and acts like its native format. Sometimes SQL is just plain wrong for the job!

Sorry. Didn't mean to waffle! :-$

digitala