views:

414

answers:

3

Hi All,

I have a small application in java which searches images using bing image search. The problem I am facing is that, its getting only first 20 images. May be because when we search on bing.com it populates first 20 images first and then its an infinite scrolling feature.

Is there any way to search more than 20 images using bing?

Cheers :)

A: 

I'm guessing this is because this site uses ajax to populate the "infinite" scrolling list as you call it.

You probably send an http request and get the initial page (btw on my browser I got 6 images accross x 4 down, i.e. 24 not 20; thinking about it maybe my client also got 20 only at first and got the last 4 w/ ajax...), and you'd need to do the paging trough by way of ajax requests.

At a glance, the xhtml and associated javascript of the page is very dense and somewhat obfuscated, It would take a while to get oriented... An alternative to analyzing this page is to instead use a packet sniffer (such as wireshark) and to capture the requests which take place when you scroll down.

Essentially this will likely expose some form of ajax request, which you can then easily emulate with java. Typically the ajax response is easy to parse whatever its nature (xml, jason, gzip...).

A possible snags to this well laid out plan is if the returned data in the ajax response is encrypted, for example where the extra images are bundled in some sort of envelope for which you'll then need to discover the format.

Depending on the actual task at hand, you may try alternatives such as automations within GreaseMonkey (on Firefox) or similar tools.

What of Bing API ?
Note that all the above approaches are akin to screen-scraping and hence quite sensitive to even minute changes in the Bing application, and, depending on effective usage and context, this could put the project in a legal grey area... A better approach may be to register and obtain a proper application ID with MS/Bing and to use the Bing API.

mjv
Hi,Thanx for reply. Yeah, I am sending a HTTP request and reading from the page. Can you please suggest something for paging?
Zinx
@Zinx, see my edits. I suggest a few approaches. I'm busy at the moment, so I'll let you "take it from there". This issue of automating browsing and content collection off the web is very common, and the novel issues associated with newer site structures (such as ajax etc.) probably start to be well understood in the developer community. Look on SO, you may find readily available libraries or tools to address this very problem. Do not focus on "Bing" for this tool/info search as this keyword makes it too restritive.
mjv
What is SO. Didn't get what you are referring to ?
akjain
@akjain SO = StackOverflow.com, i.e. this site.
mjv
@mjv ya, I realized after posting comment. Sorry for asking a silly Q.
akjain
@akjain NP (No Problem ;-) ) Don't we ALL ask silly Qs every once in a while... Go ahead and erase these [if you want], I'll do the same of my replies.
mjv
A: 

You are simulating a browser? Doesn't the Bing engine have an entry point for programs instead - a web service or so - which would make your task much easier.


EDIT: SDK appears to be here: http://msdn.microsoft.com/en-us/library/cc980922.aspx

Thorbjørn Ravn Andersen
@Thor, you probably mean the `Bing API` (the SDK link you provide is to the Bing _MAP_ service), but yes!, you got the right idea! (Although I pay relatively little heed to the "SO rep race", I sometimes get into the game, and respond too directly to the OP's quest)
mjv
A: 

Just wanted to post a direct answer to the question: Bing uses Ajax (of course) for the infinite scroll. Each "tick" is based on a simple ajax get request, which accuires new images.

For instance, this url returns 30 results (121-151) in a "htmlraw" format based on the query "max payne". http://www.bing.com/images/async?q=max+payne&format=htmlraw&first=121

Edit: It works with the original url too, just add &first=NUMBER to the querystring. Example: www.bing.com/images/search?q=payne&go=&form=QBLH&scope=images&filt=all&first=10

I am building my own bulk image collector (for a "learning project" for myself) and I found out that it is paginated like this.

FYI, Google and Bing are easy, Yahoo and Altavista (redundant, since their results are from Yahoo) are far more problematic - they don't post the directlink to the original image.

Have fun! :)

DavidMB