views:

140

answers:

2

I'm creating a Twitter application, and every time user updates the page it reloads the newest messages from Twitter and saves them to local database, unless they have already been created before. This works well in development environment (database: sqlite3), but in production environment (mysql) it always creates messages again, even though they already have been created.

Message creation is checked by twitter_id, that each message has:

msg = Message.find_by_twitter_id(message_hash['id'].to_i)
if msg.nil?
  # creates new message from message_hash (and possibly new user too)
end
msg.save

Apparently, in production environment it's unable to find the messages by twitter id for some reason (when I look at the database it has saved all the attributes correctly before).

With this long introduction, I guess my main question is how do I debug this? (unless you already have an answer to the main problem, of course :) When I look in the production.log, it only shows something like:

Processing MainPageController#feeds (for 91.154.7.200 at 2010-01-16 14:35:36) [GET]
Rendering template within layouts/application
Rendering main_page/feeds
Completed in 9774ms (View: 164, DB: 874) | 200 OK [http://www.tweets.vidious.net/]

...but not the database requests, logger.debug texts, or anything that could help me find the problem.

+3  A: 

You can change the log level in production by setting the log level in config/environment/production.rb

config.log_level = :debug

That will log the sql and everything else you are used to seeing in dev - it will slow down the app a bit, and your logs will be large, so use judiciously.

But as to the actual problem behind the question...

Could it be because of multiple connections accessing mysql?

If the twitter entries have not yet been committed, then a query for them from another connection will not return them, so if your query for them is called before the commit, then you won't find them, and will instead insert the same entries again. This is much more likely to happen in a production environment with many users than with you alone testing on sqlite.

Since you are using mysql, you could use a unique key on the twitter id to prevent dupes, then catch the ActiveRecord exception if you try to insert a dupe. But this means handling an error, which is not a pretty way to handle this (though I recommend doing it as a back up means of prevent dupes - mysql is good for this, use it).

You should also prevent the attempt to insert the dupes. One way is to use a lock on a common record, say the User record which all the tweets are related to, so that another process cannot try to add tweets for the user until it can get that lock (which you will only free once the transaction is done), and so prevent simultaneous commits of the same info.

Andrew Kuklewicz
Thanks for the debug code. When i was looking into the log, i found the problem: for some reason in the production/mysql database it always saves the same twitter_id to messages (2147483647), which isn't even valid id for any message. Dunno why i didn't notice that before..Actually, in the production.log it show the correct id's on creation messages, such as:[4;36;1mMessage Create (0.4ms)[0m [0;1mINSERT INTO `messages` (`retweets`, `twitter_id`, ... ) VALUES(0, 7852958107, ...)...but in the database the all of the twitter_id:s have changed to 2147483647.
JussiR
The twitter ID is clearly too big for the column - MySQL by default will accept invalid values for columns and truncate them to the nearest value. Turn on SQL_MODE to something like 'TRADITIONAL' to prevent this misbehaviour. To fix the problem, change the column to a bigger int type or something else. Oh yes - and always test your app with the same database server (and all other software) as production - it's not worth doing anything else.
MarkR
Thanks! That solved the problem. Didn't find how to change the SQL_MODE (in myPHPAdmin). Though probably there's an SQL command to do that do? Anyways, problem solved, i'm happy. :)
JussiR
+1  A: 

I ran into a similar issue while saving emails to a database, I agree with Andrew, set the log level to debug for more information on what exactly is happening.

As for the actual problem, you can try adding a unique index to the database that will prevent two items from being saved with the same parameters. This is like the validates_uniqueness but at the database level, and is very effective: http://stackoverflow.com/questions/1954299/mysql-constraign-database-entries-in-rails.

For example if you wanted no message objects in your database that had a duplicate body of text, and a duplicate twitter id (which would mean the same person tweeted the same text). Then you can add this to your migration:

add_index( :message, [:twitter_id, :body] , :unique => true)

It takes a small amount of time after you tell an object in Rails to save, before it actually gets in the database, thats maybe why the query for the id doesn't find anything yet.

ThinkBohemian
Thanks, i have to look into that. The weird thing is that even when i did the normal "validates_uniqueness :twitter_id", it still created the messages and gave them all the same twitter_id (as described in the comment to Andrews answer).ps. sorry if my explanation has been confusing, but twitter_id refers to the messages own id in twitter database (so it's not user id).
JussiR
I know the problem is solved, and I am glad, but I feel I should mention that validates_uniqueness is terrible - it does not work, as the lag between its check and the actual commit allows dupes to happen pretty easily.
Andrew Kuklewicz
Good to know. I'll need to start using the method shown above.
JussiR