ansaurus

Question

Can I scrape flash?

Answer 1

+1 A:

As a very crude first step you could use Google to get a text snippet out of the swf, given that the swf has been indexed by Google and that you know it's URL. e.g:

http://www.google.com/search?q=site%3Awww.michaelgraves.com%2Fmga.swf

cherouvim 2010-02-08 17:51:10

Answer 2

+2 A:

Yanking "external links" out of a flash can be as simple as, for instance:

curl -s http://hostname/path/to/file.swf | strings | grep http

Of course, this'll fail if the author has taken any attempt to hide the URL.

YMMV a lot. Good luck!

MikeyB 2010-02-08 17:54:08

curl's output just looks like a bunch of random characters, nothing as co-herent as http. I used curl www.michaelgraves.com/mga.swf -o test.txt. Does strings do something to convert to readable text?

Mike Pateras 2010-02-08 18:18:33

the `strings` program yanks what may be human-readable strings out of a binary data stream. The `grep` is pulling out any strings containing the word `http`. You can also try modifying the strings command options to give you more useful output (`strings -10`: only output strings of at least 10 characters)

MikeyB 2010-02-08 18:28:59

So if the file doesn't contain an "http" string, strings isn't going to give it to me, right?

Mike Pateras 2010-02-08 18:37:59

@Mike: That's right, exactly.

MikeyB 2010-02-08 19:41:03

So what are my options if that output is entirely garbage? Is that just a reality for some sites?

Mike Pateras 2010-02-08 23:30:48

I would say that your next step would be to find some application that actually understands the .swf format to parse it. A quick Google search (parsing .swf) leads me to http://flashpanoramas.com/blog/2007/07/02/swf-parser-air-application/ which looks promising.

MikeyB 2010-02-09 00:11:57

Answer 3

+3 A:

Decompiling the Flash source would let you see the ActionScript part of the Flash file, which I've found to often contain info like links.

A free decompiler is Flare. It's command line only, and works fine. It won't decode some of the info in newer Flash formats (>CS3 I think). It dumps all the AS into one file.

Sothink SWF Decompiler is a more sophisticated commercial program. It will work fine with any Flash file I've tried and the results are quite thorough and well organized. it's GUI based and I don't know if it is easily automated.

With Flare, since it's a command line tool, one could easily write a script to obtain the SWF, decompile it, grep for 'http://', and log the results.

Alex JL 2010-02-14 04:47:54

ansaurus

tags:

views:

answers:

Can I scrape flash?

related questions