views:

128

answers:

1

I'm programming a site in PHP/MySQL that gets search results for products via API from an external site. This site also will have it's own products and the owners of the site want the search results to be inter-connected.

If someone searches for VIDEO, ordered by date then the results should be all in order regardless of the source it came from.

eg.

July 31 - Video A - our database
July 30 - Video B - via API
July 29 - Video C - via API
July 28 - Video D - our database
...

The problem I'm having is figuring out a way to do this effectively especially regarding viewing multiple pages of results. If someone clicks to the 2nd page of results then I need to figure out the last item on the first page of results (and the last item from the API), then only get the items from the API starting after the last API item viewed on the previous page and then do the same for our database results and re-combine them again.

In order to avoid this complex algorithm, another idea I had was to limit the results to a large amount - like 500 results and grab them all at once and order them. Then if the user goes forward a few pages, I do not have to re-grab all the data.

Does anyone have suggestions on good algorithms to use to combine two search results?

+2  A: 

Whether you use it for caching or not, you will need to grab at least a page worth of results from both sources, in case all the next results will come from that source.

Grabbing a lot of results and caching them (in the session) is one solution you could use.

If for some reason you don't want to cache all the results (if the operation is expensive and you need this optimized), you could store a simple array in the session that contains the location of the results, and then you would know the starting number for the next page.

For example (pseudo code)

**Request 1**
Get 10 results from API
Get 10 results form Database
Merge the results
Display first 10 and save the order to an array
   (A for API, D for Database, ex: A,A,A,D,A,D,D,A,D,A)

User clicks page 2

**Request 2** (Page 2)
Get 10 results from API starting at 5
Get 10 results from Database starting at 7
Repeat merge and display above.

You could also optionally cache what you have needed to retrieve so far (and you will have 10 extra results). This would make the first request longer, but could possibly make the second request much faster.

If the user jumps forward several pages, you would need to get the largest number of results that could have been displayed in the preceeding unknown pages from each source.

If you are not too worried about performance from either source, I would retrieve up to a large number like you said and cache all results temporarily. As soon as a new search is executed, dump the old results.

Renesis