views:

73

answers:

4

Hi guys, i am familiar with memcached and eager loading, but neither seems to solve the problem i am facing.

My main performance lag comes from hundreds of data retrieval calls from the database. The tricky thing is that I do not know which set of users i need to retrieve until i have several steps of computation.

I can refactor my code, but i was wondering how you experts handle this situation? I think it should be a fairly common situation

def newsfeed

  - find out which users i need
  - retrieve those users via DB

  - find out which events happened for these users
  - for each of those events
        - retrieve new set of users

  - find out which groups are relevant
  - for each of those groups
        - retrieve new set of users 

  - etc, etc 

end
+2  A: 

Denormalization is the magic password for your situation.

There are several ways to do this: For example, store the ids of the last 10 users in the event and group.

Or create a new model NewsFeedItem (belongs_to :parent, :polymorphic => true). When a user attends an event, create a NewsFeedItem with denormalized informations like this users name, his profile pic etc. Saves you from second queries to user_events and users.

Thomas R. Koll
I don't have a better answer, and i've never done this before, but wouldn't this technique potentially create quite a bit of duplicate data? Maybe i'm missing something, or maybe the benefits outway the costs.
ThinkBohemian
Denormalization usually (always?) means you're no longer storing the data in the an optimal fashion as regards space and time to update, but instead in a way that makes your queries faster. If queries dominate updates then this can be a net win.
pdbartlett
this is cool, thanks! You taught us a new term -> denormalization. ;) Question: Why do you need the polymorphic=> true for this use case?
ming yeow
You need the polymorphic as the parent can be a group or an event or a whatever.In the views you can use different partials for each possible types of news items.
Thomas R. Koll
+1  A: 

You should be able to do this with only one query per Event / Group loop. What you'll want to do is: inside your for loop add user ids to a Set, then after the for loop, retrieve all the User records with those ids. Rinse and Repeat. Here is an example:

def newsfeed

  user_ids = Set.new
  # find out which users i need
  ...  add ids to user_ids
  # retrieve those users via DB
  users = User.find(user_ids.to_a)

  # find out which events happened for these users
  # you might want to add a condition
  # that limits then events returned to only recent ones
  events = Event.find_by_user_id(user_ids.to_a)

  user_ids = Set.new
  events.each do |event|
    user_ids << discover_user_ids_for_event(event)

  # retrieve new set of users
  users = User.find(user_ids.to_a)

  # ... and so on  

end

I'm not sure what your method is supposed to return, but you can likely figure out how to use the idea of grouping finds together by working with collections of IDs to minimize DB queries.

Daniel Beardsley
thanks! this is what i was trying to check whether i can prevent doing or not, because this would mean some rewriting of quite a few functions. But i think i would have to do both, along with the first reply of denormalization
ming yeow
+1  A: 

hi ming

Do you want to show all the details at once (I mean when the page is loading do you really want to load all of those information) , If not what you can do is, load them on demand

as follows

def newsfeed

  • find out which users i need
  • retrieve those users via DB

  • find out which events happened for these users

    once you show the events give them a button or something to drill down to other details (on -demand) then load them using AJAX (so that page will not refresh)

    use this technique repeatedly when users want to go deep details

By doing this , you will save lots of processing power and will get only the details user needs

I dont know if this is applicable to your situation

If not then you have to find a more optimized way of loading details

cheers, sameera

sameera207
thanks sameera! certainly that could be done - i would have to evaluate the performance of these 2 methods. Thanks for the suggestions!
ming yeow
+1  A: 

I understand that you are trying to perform some kind of algorithm on the basis of your data to do some kind of recommendation or similar sort of thing.

I have two suggestions:

1) You reevaluate your algorithm / design on the basis of what you actually want to achieve. For instance, in cases where an application has users who can potentially have lots of posts and the app wants to perform some algorithm on the basis of the number of posts then it will be quite expensive to count their posts every time. To optimise this, a post_count column can be added on the user model and increase that count whenever a user successfully does a post. Similarly, if you can establish some kind of relation like this between your user, events, groups etc, then think of something on those lines.

2) If first solution is not feasible, then for anything like this you must avoid doing multiple queries and then using ruby for crunching data which would obviously be very expensive and is never advisable if you have large data set. So what you need here is to make one sql query using join and get all data in just one go. Also pick only those field names from the database that you need. It really helps in case of large data sets. For instance, if you need user id and event_id from user and events table and nothing else then do something like so

User.find(:all, 
      :select => 'users.id, users.event_id', 
      :joins => 'join events on users.id = events.user_id',
      :conditions => ['users.id in (your user ids)'])

I hope this will point you in the right direction.

nas
good points! moving forward, storing a "Score" per event would certainly be a potentially powerful solution. I do not have enough confirmed product specs yet, but when i do, that would be a great potential solution
ming yeow