views:

131

answers:

4

I am making a data intensive web application that I am trying to optimize. I've heard of forking and threading, but I have no idea whether they are applicable to what I am trying to do and if so how to implement them. My code looks like this:

  def search
      @amazon_data=Hash.from_xml(item.retrieve_amazon(params[:sku]))
        unless @amazon_data['results'] == nil
          @amazon_data['results']['item'].size.times do |i|
            @all_books << { :vendor => 'Amazon.com',
                            :price => @amazon_data['results']['item'][i]['price'].to_f,
                            :shipping => @amazon_data['results']['item'][i]['ship'].to_f,
                            :condition => @amazon_data['results']['item'][i]['condition'],
                            :total => @amazon_data['results']['item'][i]['price'].to_f + @amazon_data['results']['item'][i]['ship'].to_f,
                            :availability => 'In Stock',
                            :link_text => 'Go to Amazon.com',
                            :link_url => "http://www.amazon.com/gp/offer-listing/#{params[:isbn]}"
            }
        end
      end
       @ebay_data=Hash.from_xml(Book.retrieve_ebay(params[:sku]))
        unless @ebay_data['results'] == nil
          @ebay_data['results']['item'].size.times do |i|
            @all_books << { :vendor => 'eBay',
                            :price => @ebay_data['results']['item'][i]['price'].to_f,
                            :shipping => @ebay_data['results']['item'][i]['ship'].to_f,
                            :condition => 'Used',
                            :total => @ebay_data['results']['item'][i]['price'].to_f + @ebay_data['results']['item'][i]['ship'].to_f,
                            :availability => 'In Stock',
                            :link_text => 'Go to eBay',
                            :link_url => "http://www.amazon.com/gp/offer-listing/#{params[:sku]}"
            }
        end
    end
  end

So, basically what I have are two actions that retrieve data from eBay and Amazon and parse it here. How might I make both of these actions run at once? Do fork or thread have anything to do with what I am trying to accomplish?

A: 

It's hard to say without more info, but my suspicion is that waiting for the API responses is where the majority of time is spent.

Try a different approach, where the request and processing of the API response is handled in a different process from the web serving process. The front end code will likely have to periodically poll for results, and inject the results of the operation into the page. But win is that the whole request doesn't get backed up waiting for Amazon and Ebay to do their thang.

There are several plugins that can help, delayed_job is a good place to start.

Kenny
A: 

Unfortunately the application requires realtime user input to query the APIs. The returned data needs to be fresh as it has to do with product pricing in marketplaces...For instance, a user would enter a SKU and with that information the program would make a request to the applicable sites (Amazon and eBay in this case.) Currently it makes the request to Amazon, parses the data, formats it, and then moves on to eBay, parses the data, and formats that. Then the formatted data is displayed in the view.

My thought was if I could make those API calls at the same time (on different threads?) it would save time on the web serving end as all that would be required is to parse the returned data and format it correctly. (Which I might also be able to expedite...)

ryan
A: 

This cuts the API time in half, but I don't know how to return the results. The subsequent view loads before the API results are returned.... It is returning data, however. When I code in

puts @all_books

within the thread results are displayed in the console. Outside of the thread, however, results are not returned.

def search
    Thread.new do
      @amazon_data=Hash.from_xml(item.retrieve_amazon(params[:sku]))
        unless @amazon_data['results'] == nil
          @amazon_data['results']['item'].size.times do |i|
            @all_books << { :vendor => 'Amazon.com',
                            :price => @amazon_data['results']['item'][i]['price'].to_f,
                            :shipping => @amazon_data['results']['item'][i]['ship'].to_f,
                            :condition => @amazon_data['results']['item'][i]['condition'],
                            :total => @amazon_data['results']['item'][i]['price'].to_f + @amazon_data['results']['item'][i]['ship'].to_f,
                            :availability => 'In Stock',
                            :link_text => 'Go to Amazon.com',
                            :link_url => "http://www.amazon.com/gp/offer-listing/#{params[:isbn]}"
            }
        end
      end
     end
    Thread.new do
       @ebay_data=Hash.from_xml(Book.retrieve_ebay(params[:sku]))
        unless @ebay_data['results'] == nil
          @ebay_data['results']['item'].size.times do |i|
            @all_books << { :vendor => 'eBay',
                            :price => @ebay_data['results']['item'][i]['price'].to_f,
                            :shipping => @ebay_data['results']['item'][i]['ship'].to_f,
                            :condition => 'Used',
                            :total => @ebay_data['results']['item'][i]['price'].to_f + @ebay_data['results']['item'][i]['ship'].to_f,
                            :availability => 'In Stock',
                            :link_text => 'Go to eBay',
                            :link_url => "http://www.amazon.com/gp/offer-listing/#{params[:sku]}"
            }
        end
      end
    end
  end

Am I on the right track? How can I return the results from within the thread? Is it that the variable is only accessible within the thread, or does the problem lie in the fact that the program progresses before the results are returned?

ryan
+1  A: 

Yeah, I still think you'd be better off with a job scheduler in this case. The absolute fastest that an action like this can perform is the slower of the two API requests --- and you have no guarantees about network latency, load on the remote API, etc. Other the other hand, you will have to implement some Javascript code to periodically poll to detect the job completion and inform the user of the results.

Also, thread behavior in ruby 1.8 can be kinda funky at times, especially at scale, so beware.

Kenny