views:

26

answers:

0

Hi guys,

We have a database of listings that contains 2 million records, each with one or more (of a potential 2500) categories (1:n).

Over time, some listings have been added to many irrelevant categories (some listings have as many as 50 categories where we like to keep it below 5.)

What I'd love to do is audit each category like so;

  • Given a specific listing, get all listings that share one of its categories
  • Then, apply a density percentage to each of its other categories based on the other listings (eg, Listing1 is in the category Category1. We find 100,000 other listings in Category1. Listing1 is also in Category2, but only 2% of the 100,000 listings are also in Category2, so Category2 becomes almost irrelevant.)

Is there an app that will do this for me already? I dont event really know what to search for. I'm happy to write my own, but I fear the efficiency of writing something to process a potential 50,000,000 relationships might render it unusable.

I'd love to hear some suggestions?

Thanks in advance.

related questions