views:

262

answers:

1

My Problem: I am trying to developing a web application for currency traders. The application allows traders to enter or upload information about their trades and I want to calculate a wide variety of statistics based on what the user entered.

Now, normally I would use a relational database for this, but I have two requirements that don't fit well with a relational database so I am attempting to use couchdb. Those two problems are: 1) Primarily, I have a companion desktop application that users will be able to work with and replicate to the site using couchdb's awesome replication feature and 2) I would like to allow users to be able to define their own custom things to track about trades and generate results based off of what they enter. The schema less nature of couch seems perfect here, but it may end up being harder than it sounds. (I already know couch requires you to define views in advance and such so I was just planning on sticking all the custom attributes in an array and then emitting the array in the view and further processing from there.)

What I Am Doing: Right now I am just emitting each trade in couch keyed by each user's system and querying with the key of the system to get an array of trades per system. Simple. I am not using a reduce function currently to calculate any stats because I couldn't figure out how to get everything I need without getting a reduce overflow error.

Here is an example of rows that are getting emitted from couch:

{"total_rows":134,"offset":0,"rows":[
{"id":"5b1dcd47221e160d8721feee4ccc64be",
"key":["80e40ba2fa43589d57ec3f1d19db41e6","2010/05/14 04:32:37 +0000"], null,
"doc":{
     "_id":"5b1dcd47221e160d8721feee4ccc64be",
     "_rev":"1-bc9fe763e2637694df47d6f5efb58e5b",
     "couchrest-type":"Trade",
     "system":"80e40ba2fa43589d57ec3f1d19db41e6",
     "pair":"EUR/USD",
     "direction":"Buy",
     "entry":12600,
     "exit":12700,
     "stop_loss":12500,
     "profit_target":12700,
     "status":"Closed",                  
     "slug":"101332132375",
     "custom_tracking": [{"name":"signal", "value":"Pin Bar"}]
     "updated_at":"2010/05/14 04:32:37 +0000",
     "created_at":"2010/05/14 04:32:37 +0000",
     "result":100}}
]}

In my rails 3 controller I am basically just populating an array of trades such as the one above and then extracting out the relevant data into smaller arrays that I can compute my statistics on.

Here is my show action for the page that I want to display the stats and all the trades:

def show
  @trades = Trade.by_system(:startkey => [@system.id], :endkey => [@system.id, Time.now ])

  @trades.each do |trade|

  if trade.result > 0
    @winning_trades << trade.result
  elsif trade.result < 0
    @losing_trades << trade.result
  else
    @breakeven_trades << trade.result 
  end 

  if trade.direction == "Buy"
    @long_trades << trade.result
  else
    @short_trades << trade.result
  end

  if trade["custom_tracking"] != nil
      @custom_tracking << {"result" => trade.result, "variables" => trade["custom_tracking"]}  
  end 

end
end 

I am omitting some other stuff that is going on, but that is the gist of what I am doing. Then I am calculating stuff in the view layer to produce some results:

<% winning_long_trades = @long_trades.reject {|trade| trade <= 0 } %>
<% winning_short_trades = @short_trades.reject {|trade| trade <= 0 } %>

<ul>
  <li>Total Trades: <%= @trades.count %></li>
  <li>Winners: <%= @winning_trades.size %></li>
  <li>Biggest Winner (Pips): <%= @winning_trades.max %></li>
  <li>Average Win(Pips): <%= @winning_trades.sum/@winning_trades.size %></li>
  <li>Losers: <%= @losing_trades.size %></li>
  <li>Biggest Loser (Pips): <%= @losing_trades.min %></li>
  <li>Average Loss(Pips): <%= @losing_trades.sum/@losing_trades.size %></li>
  <li>Breakeven Trades: <%= @breakeven_trades.size %></li>
  <li>Long Trades: <%= @long_trades.size %></li>
  <li>Winning Long Trades: <%= winning_long_trades.size %></li>
  <li>Short Trades: <%= @short_trades.size %></li>
  <li>Winning Short Trades: <%= winning_short_trades.size %></li>
  <li>Total Pips: <%= @winning_trades.sum + @losing_trades.sum %></li>
  <li>Win Rate (%): <%= @winning_trades.size/@trades.count.to_f * 100 %></li>
</ul>

This produces the following results, which aside from a few things is exactly what I want:

  • Total Trades: 134
  • Winners: 70
  • Biggest Winner (Pips): 1488
  • Average Win(Pips): 440
  • Losers: 58
  • Biggest Loser (Pips): -516
  • Average Loss(Pips): -225
  • Breakeven Trades: 6
  • Long Trades: 125
  • Winning Long Trades: 67
  • Short Trades: 9
  • Winning Short Trades: 3
  • Total Pips: 17819
  • Win Rate (%): 52.23880597014925

What I Am Wondering- Finally The Actual Questions: I am starting to get really skeptical of how well this method will work when a user has 5,000 trades instead of just 134 like in this example. I anticipate most users will only have somewhere under 200 per year, but some users may have a couple thousand trades per year. Probably no more than 5,000 per year. It seems to work ok now, but the page load times are already getting a tad high for my tastes. (About 800ms to generate the page according to rails logs with about a 250ms of that spent in the view layer.) I will end up caching this page I am sure, but I still need the regenerate the page each time a trade is updated and I can't afford to have this be too slow. Sooo.....

  1. Is doing something similar here possible with a straight couchdb reduce function? I am assuming handing this off to couch would possibly help with larger data sets. I couldn't figure out how, but I suppose that doesn't mean it isn't possible. If possible, any hints will be helpful.
  2. Could I use a list function if a reduce was not available due to reduce constraints? Are couchdb list functions suitable for this type of calculations? Anyone have any idea of whether or not list functions perform well? Any hints what one would look like for the type of calculations I am trying to achieve?
  3. I thought about other options such as running the calculations at the time each trade was saved or nightly if I had to and saving the results to a statistics doc that I could then query so that all the processing was done ahead of time. I would like this to be the last resort because then I can't really filter out trades by time periods dynamically like I would really like to. (I want to have a slider that a user can slide to only show trades from that time period using the startkey and endkey in couchdb if I can.)
  4. If I should continue running the calculations inside the rails app at the time of the page view, what can I do to improve my current implementation. I am new to rails, couch and programming in general. I am sure that I could be doing something better here. Do I need to create an array for each stat or is there a better way to do that.

I guess I just would really like some advice on how to tackle this problem. I want to keep the page generation time minimal since I anticipate these being some of the highest trafficked pages. My gut is that I will need to offload the statistics calculation to either couch or run the stats in advance of when they are called, but I am not sure.

Lastly: Like I mentioned above, one of the primary reasons for using couch is to allow users to define their own things to track per trade. Getting the data into couch is no problem, but how would I be able to take the custom_tracking array and find how many winning trades for each named tracking attribute. If anyone can give me any hints to the possibility of doing this that would be great.

Thanks a bunch. Would really appreciate any help. Willing to fork out some $$$ if someone wants to take on the problem for me. (Don't know if that is allowed on stack overflow or not.)

A: 

First, if you want to provide stats on all trades performed (by a user), then yes, reduce views can be awesome. Because a reduced result is actually stored on disk in the view index, this should be able to massively speed up your views.

Second, the default reduce_limit in CouchDB is just to limiting. If you know what you're doing, you should just disable it and make your reduce functions return some more information. (Notably, and I think this is what you're running into, the default reduce_limit has a fixed size limit on the JSON structure returned; if, however, the size of the returned structure is more or less constant in the number of rows processed, then the actual size isn't really a problem -- although it should probably not be huge.)

I don't think using a list function would help here, but I don't have much experience with those.

I can probably help you out further for money if you want to, or you could just stop by #couchdb on freenode and ask some questions. There's usually a pretty knowledgeable crowd around there.

djc