views:

42

answers:

1

I would like to implement something similar, but I'm running into some problems. I would like to know what my options are to approach this problem, and what are the common techniques used in this scenario. (See bottom of this question if you are unfamiliar with Netflix)

Current Approach Create a "controls" table that houses information about a customer's status and cross references it to a service plans table.

controls(member_id, movies_rented_this_month, movies_at_home)
plans(movies_per_month_limit, movies_at_home_limit)

When an item is returned check the controls table to see if the customer qualifies to receive another order.

if controls.movies_at_home < plan.movies_at_home_limit
and if controls.movies_this_month < plan.movies_this_month_limit

For anyone who has no previous order (new customer), or who has nothing in their movie queue at the time of closing an order, we create a scheduled event to create the orders (polling).

Problem We need to account for how many orders each customer can have based on their plan. The aforementioned logic fails in some scenarios:

plans.movies_this_month_limit = 4, controls.movies_this_month = 3
plans.movies_at_home_limit = 2 , controls.movies_at_home = 0

In the above scenario a customer who qualifies for one order will receive two. Reversing the criteria reverses the problem.

Simplified Schema

members(id, plan_id)
movies(id, title)
plans(id, movies_at_home_limit, movies_per_month_limit)
controls(member_id, movies_at_home, movies_this_month)
movie_queue(member_id, movies_id)

Netflix An online movie rental service that allows a member to keep a wishlist of movies. Customers receive movies incrementally from their wish list (by mail), based on their type of plan.

+1  A: 

I suspect it doesn't use the CONTROLS table, but instead checks its internal shipping / receiving history to determine on the fly how many movies any customer has own when determining whether or not to ship a customer a movie.

Given the high selectivity of a single customer ID within a shipping / receiving history table, it's not expensive to probe the hypothetical indexes on (MEMBER_ID, SHIP_DATE, DISC_ID) and (MEMBER_ID, RECEIVED_DATE, DISC_ID) to answer the questions the CONTROLS table seeks to.


CONTROLS isn't a table; it's a view / scalar subquery. Given a member ID and a date, and assuming that SHIPPED_DISK and RECEIVED_DISK are separate tables for clarity:

movies_rented_this_month = (SELECT COUNT(*) FROM SHIPPED_DISC 
                             WHERE MEMBER_ID = :member_id
                               AND SHIP_DATE >= FIRST_DAY_OF_MONTH (:date)
                               AND SHIP_DATE <  FIRST_DAY_OF_MONTH (:date) + 1 month)

movies_at_home = (SELECT COUNT(*) FROM SHIPPED_DISC shp
                   WHERE MEMBER_ID = :member_id
                     AND NOT EXISTS (SELECT NULL FROM RECEIVED_DISC rcv
                                      WHERE shp.member_id = rcv.member_id
                                        AND shp.disc_id   = rcv.disc_id)

or perhaps

       (SELECT ship_count - receive_count 
          FROM (SELECT COUNT(*) ship_count FROM SHIPPED_DISC shp
                 WHERE MEMBER_ID = :member_id
                UNION ALL
                SELECT COUNT(*) receive_count FROM RECEIVED_DISC rcv
                 WHERE MEMBER_ID = :member_id) dummy
        )

Alternatively, you could maintain a MEMBER_HAS_DISC table (again, I'm assuming discs, but with a Netflix, there's streaming as well, so it may need to be abstracted to MEMBER_HAS_PRODUCT), where the shipping / receiving logs insert into and delete from that table, and it would be very easy to check. I'd assume it's more useful to know what a customer has than how many of their tokens they've used.

Adam Musch
Adam, what you are saying is that shipping/receiving tables can handle the logic of generating orders instead of using a control's table? Could you please give me a brief example of how this could function? I don't mean code, but rather an overview of the logic involved...
Mel
see edited answer, as it was too involved for a comment.
Adam Musch
Thanks Adam. I see how it works. But would it not be simpler if we added a shipment_returned_on column to the shipments table and avoided having a returns table all together?
Mel
Perhaps. It depends on the data volume. I can say that in Oracle and DB2, updating a table is significantly more expensive than inserting a record. If they're independent events, perhaps they should be modeled as such.
Adam Musch