ansaurus

Question

Removing duplicates from array before saving

Answer 1

+2 A:

you can validate_uniqueness_of :twitter_id in the Tweet model (where this code should be). This will cause duplicates to fail to save.

Ben Hughes 2009-07-10 17:14:45

validate_uniqueness_of :twitter_id it's not a good solution. Between the time it checks the existence of the record and it creates a new record, an other process might create a duplicate. You should always use this method in conjunction with a database index.

Simone Carletti 2009-07-10 19:24:59

@weppos: Since I have only one sequential job writing tweets, this is not a problem. This seems to be most "DRY" solution. Worked well on sqlite3, but on production mode/mysql it does not seem to notice duplicates... looking into it now.

effkay 2009-07-10 20:15:50

for actual safety, you should put uniqueness constraints on the database and just be ready to handle any exceptions that are thrown

Ben Hughes 2009-07-10 20:57:35

Answer 2

A:

array.uniq!

Removes duplicate elements from self. Returns nil if no changes are made (that is, no duplicates are found).

2009-07-10 17:17:15

won't help for duplicates in the database.

Ben Hughes 2009-07-10 20:04:03

Answer 3

+1 A:

Since it sounds like you're using the Twitter search API, a better solution is to use the since_id parameter. Keep track of the last twitter status id you got from your previous query and use that as the since_id parameter on your next query.

More information is available at Twitter Search API Method: search

Ryan McGeary 2009-07-10 17:25:57

Answer 4

A:

Ok, turns out the problem was a bit of different nature: When looking closer into it, I found out that multipe Tweets were saved with the twitter_id 2147483647... This is the upper limit for integer fields :)

Changing the field to bigint solved the problem. It took me very long to figure out since MySQL did silently fail and just reverted to the maximum value as long as it could. (until I added the unique index). I quickly tried it out with postgres, which returned a nice "Integer out of range" error, which then pointed me to the real cause of the problem here.

Thanks Ben for the validation and indexing tips, as they lead to much cleaner code now!

effkay 2009-07-11 13:06:17

ansaurus

tags:

views:

answers:

Removing duplicates from array before saving

related questions