views:

94

answers:

2

For example, if we are doing Analytics recording the page_type, item_id, date, pageviews, timeOnPage.

It seems that they are several ways to avoid it. Is there an automatic way?

  1. create index on the fields that uniquely identify the record, for example [page_type, item_id, date] and make the index unique, so that when adding the same record, it will reject it.

  2. or, make the above the primary index, which is unique, if the DB or framework supports it. In Rails, usually the ID 1, 2, 3, 4 is the primary index, though.

  3. or, query the record using the [page_type, item_id, date], and then update that record if it already exists (or don't do anything if the pageviews and timeOnPage already has the same values). If record doesn't exist, then insert a new record with this data. But if need to query the record this way, looks like we need an index on these 3 fields anyways.

  4. Insert new records all the time, but when query for the values, use something like

    select * from analytics  where ...  order by created_at desc limit 1
    

that is, get the newest created record and ignore the rest. But this seems like a solution for 1 record but not so feasible when it is summing up values (doing aggregates), such as select sum(pageviews) or select count(*).

Is there also some automatic solution besides using the methods above?

+1  A: 

I can't speak for Mongoid/MongoDB, but if you wish to enforce a uniqueness constraint in a relational database, you should create a uniqueness constraint. That's what they're there for! In MySQL, that is equivalent to a unique index; you could specify it as CONSTRAINT ... UNIQUE (col1, col2), but this will just create a unique index anyway.

wuputah
+1  A: 

Jian,

Your first option seems viable to me. And simplest way. Mongo supports this feature by default.

On insert it will check for the unique combination, if exists it will ignore the insert and write the "E11000 duplicate key error index" message in server log. Otherwise it will proceed with the normal insertion.

But it seems this will not work in the case of bulk insert. If any duplicate is there, entire batch will be failed. Quick googling shows up the existing mongo bug reporting jira ticket. Its still open.

Cheers

Ramesh Vel