ansaurus

Question

Technology for a reliable, persistent stack

Answer 1

A:

You should check out AMQP. I'm digging around on google atm, and unfortunately have no reason to believe that it can maintain a stack instead of a queue, but there ARE several open source implementations and aside from the FIFO vs. LIFO issue it's a good fit for what you want.

I don't think the database table is a bad idea either, as long as you don't need to scale past a couple thousand transactions per second you should be just fine.

easel 2010-08-29 21:29:45

I've looked at some message queue implementations (although not AMQP, I've investigated zeromq/msmq so far) and they didn't seem to support my requirement: Every new message I post is automatically the single most important one. I'll try to look into some more alternatives on that front though, thanks.

Benjamin Podszun 2010-08-29 22:59:38

Answer 2

A:

If you're going down the DB route, you could look at Triggers. It kind of depends on how sparse your messages are and how long you can wait to process them.

Lou Franco 2010-08-29 21:32:16

I'm sure there's a good idea behind this, I just lack the insight yet to see how triggers would help me. Basically a DB would, apart from the "store lots of blobs" smell, be difficult for me to use with competing consumers. I've seen quite some tricks for that, but a lot of those seemed quite hacky.Do you know a more or less decent way to do that?

Benjamin Podszun 2010-08-29 23:01:35

It's pretty much all ad-hoc. The competing consumers problem can be solved with transactions. I was suggesting that instead of polling, you could kick something off with an insert trigger. This would help if you expect messages to come far apart and need them to be serviced quickly when they arrive. If messages are arriving constantly, then polling isn't that bad.

Lou Franco 2010-08-30 12:04:02

Answer 3

A:

For point 3, you could look at this by the SO fanatic Jon Skeet, a means of serializing data to a binary blob that can easily be dumped....

In respect to Interprocess communication - what platform are we talking about here, if it's windows communicating with other windows machines, would a WCF not be suitable? As for transaction support - most ADO.NET has transaction support (as per MSDN article), unless you are talking about a filesystem transactional support as per this blog entry, or even using the System.Transaction namespace as clarified here in respect to distributed transactions.

tommieb75 2010-08-29 21:34:44

Benjamin Podszun 2010-08-29 23:04:36

Answer 4

A:

MapReduce sounds perfect for this and can be super scalable since it's what google uses for indexing web pages. Not sure what your preferable stack is but you may want to check out Hadoop

James Westgate 2010-08-29 21:53:55

Thanks, I'm really interested in map/reduce and I do agree that this seems to be a nice fit for this application. Unfortunately that doesn't solve the reliability/persistence issue. It's completely orthogonal, as far as I can tell.So while I might want to investigate M/R as a way to scale, I still need a way to push out xml fragments to <something> and _know_ that it's going to be processed, in a LIFO manner and hopefully in a somewhat sane way regarding efficiency/performance.

Benjamin Podszun 2010-08-29 23:06:41

Answer 5

+2 A:

I'd go with SQL Server for this.

Obviously you'd have to serialize your data to a blob, but any solution would have to do this (at least behind the scenes). You would then just have a table like CREATE TABLE Stack (Id int identity, Data varbinary(MAX))
Polling the database isn't necessary. SQL Server has a query notification service where you just give it a query and it will notify you when the results would be different. Your notification query would just be SELECT * FROM Stack
Locking is the database's problem, not yours. You would just have every consumer run a query (or stored procedure) that uses a transaction to return the most recent entry (the row with the highest Id) and delete it at the same time. If the query returns a result, process it and run it again. If the query returns no results, see #2.

Here's a sample query:

BEGIN TRANSACTION
SELECT Data FROM Stack WHERE Id = (SELECT MAX(Id) FROM Stack)
DELETE FROM Stack WHERE Id = (SELECT MAX(Id) FROM Stack)
COMMIT

Here's a more elegant version that doesn't even require an explicit transaction:

DELETE Stack
OUTPUT DELETED.Data
WHERE Id = (SELECT MAX(Id) FROM Stack)

If you want to do batch processing of 10 items at a time, you would use SQL like this:

DELETE Stack
OUTPUT DELETED.*
WHERE Id IN (SELECT TOP 10 Id FROM Stack ORDER BY Id DESC)

Gabe 2010-08-29 23:41:38

Thanks! Lots of good pointers in here. Have to read up on the query notification service. The queries still bother me, but that might be due to my lack of deeper understanding.The first example _seems_ to easily lead to race conditions/duplicates, if this would be code.The second more directly says "this retrieval is atomic and removes the entry" to my untrained eyes, although I'm not sure about that subquery.Since I want to process batches, I'd probably need to find a way to add a TOP xxx in there as well. Thanks a lot. Seems to be the most realistic option.

Benjamin Podszun 2010-08-30 09:13:49

Argh. Okay, "Notification Services" sounded great, but unfortunately I learned about them through you, quite some time after they are still actively maintained. I'm working with MS SQL Server 2008 (a parallel 2005 installation is not an option) and it seems I'm out of luck there. So - polling it is..

Benjamin Podszun 2010-08-30 09:19:08

If you're using SQL Server 2008, you want to use the "Service Broker" to deliver notifications (http://msdn.microsoft.com/en-us/library/ms166104.aspx), so you still don't have to worry about polling.

Gabe 2010-08-30 11:56:14

Accepted, since it's closest and I guess I have to start from here. The Service Broker seems to be useless though. It's a glorified message queue from what I can tell and - among other things - delivers messages in-order (fifo). Which is exactly what I cannot use.

Benjamin Podszun 2010-08-31 13:03:42

All that the Service Broker does is deliver query notifications so you don't have to do polling. Since it doesn't actually give you the messages from your stack, it doesn't matter that it's implemented as a queue.

Gabe 2010-08-31 14:34:43

ansaurus

tags:

views:

answers:

Technology for a reliable, persistent stack

Updates

related questions