views:

51

answers:

1

I asked this question over at ASP.NET...

http://forums.asp.net/t/1584731.aspx

...but wanted to ask it here as well. I’m sure this problem has been solved before so I figured why reinvent the wheel…

Short story, I’m building a web application with social features using memcached as a caching layer for the database. To simplify the problem, let’s assume a basic setup where we have a persons table and a friendConnection table, where persons contains the personal information and friendConnection has two foreign keys linking one person to another if they have friended one another (I'm not actually using tables or SQL, but the problem is similar)

My cache expiration logic is simple: whenever a put to a table occurs, expire all the select statements related to that table that currently exist in the cache. However that logic is terrible performance-wise because with people friending one another constantly the cache will never last for more than a few seconds.

A more complex logic might, say, expire all the select statements that contain the currently referenced friend, but that would necessitate getting ALL the select statements related to the friendConnection table and checking them for relevance which would also be a performance burden.

Firstly, does my question make sense?

Secondly, how do people solve this problem typically?

A: 

Don't associate memcached entries with the tables, associate the entries with entities (i.e. rows).

For example, make a memcached entry for each members, and the entry stores the list of that member's friends.

Here's an example with PHP. I know you're using ASP.NET, so treat this as pseudocode. :-)

<?php
$m = new Memcached();
$m->append('Luke.Doolittle', '|Bill Karwin');
$m->append('Bill Karwin', '|Luke.Doolittle');

Re your comments:

The problem that I see is that there is no generalized pattern for placing objects in memcached then.

Right. In relational databases, there is a formal pattern for modeling data. Normalization is a well-defined method for modeling data to reduce redundancy and prevent anomalies. The optimal normalized organization is determined by the data itself, and relationships between data.

In non-relational databases, there is no formalization of data modeling. The best way to organize non-relational data is not determined by the data, it's determined by your queries you need to run against that data. In this way, it's similar to the process of defining indexes or applying denormalization to a relational database.

The logic would be different for each type of object. Does that make sense?

Actually, the logic would be different for each type of query you need to run against that object. This is what leads us to store data redundantly in non-relational data stores. Because we might want to run a variety of queries against the same data, and that means we need to access the data differently to optimize for each type of query.

How do you perform removes using this technique?

Fetch the whole string from memcached, explode the values into an array, remove the element you want to remove, implode the new array, and store it back into memcached.

My example above was pretty simple; it also doesn't enforce uniqueness.

You might be interested in checking out Redis, which works like memcached but also support lists and sets natively.


I would use SQL to store data relationally, using rules of normalization. Use non-relational methods on a case-by-case basis to improve performance for specific high-priority queries -- AFTER you have used profiling to measure and prove where your bottlenecks actually are (avoid premature optimizations).

I count the following as non-relational solutions:

  • Denormalization
  • Indexing (did you know the SQL standard doesn't mention indexes at all?)
  • Caching
  • NoSQL data stores

The more tools you have in your toolbox, the more flexible you can be in responding to performance issues.

Bill Karwin
That's an interesting thought. I guess the problem that I see is that there is no generalized pattern for placing objects in memcached then. For example in your solution, I'm not throwing serialized objects into the cache, I'm throwing this customized appended list possibly of user ids (also, how do you perform removes using this technique?). But in other situations, I might just put the basic object. The logic would be different for each type of object. Does that make sense?
Luke.Doolittle
Re: Your Re: Comments: I believe that I understand what you are saying and it makes total sense: for the non-relational data store "structure" the data to support the queries that you want to run rather than a lean organization of the object. Thank you for that breakthrough. Just out of curiosity, how would you accomplish the initial task using SQL? Or is that why people shy away from relational data stores for social type applications?
Luke.Doolittle