We have an existing system that processes a lot of files on an ongoing basis. Roughly speaking, about 3 million files a day that can range in size from a few kilobytes to in excess of 50 MB. These files go through a few different stages of processing from the time they are received to when they are finished being consumed, depending on the path they take. Due to the content and format of these files, they can NOT be broken up into smaller chunks.
Currently, the workflow these files move through is rigid and dictated by the code with fixed inputs and outputs (in many cases, where one subscriber becomes the publisher for a new set of files). This lack of flexibility is starting to cause us issues however so I'm looking at some kind of pub/sub solution for being able to handle new requirements.
Most traditional pub/sub solutions have the data within the actual payload, but the large potential file sizes exceed the limits of many messaging platforms. Furthermore, we have multiple platforms in play: files progress through both Linux and Windows tiers depending on their path.
Does anyone have any design and/or implementation recommendations with the following goals in mind?
1. Multiplatform for both pub and sub (Linux and Windows)
2. Persistent storage/store-and-forward support
3. Can handle large event payloads and appropriately cleans up once all subscribers have been serviced
4. Routing/workflow is done via configuration
5. Subscribers can subscribe to a filtered set of published events based on changing criteria (e.g. only give me files of a specific type)
I've done a bunch of digging into a number of service bus and MQ implementations, but haven't quite been able to firm up enough of a design approach to properly evaluate what tools make the most sense. Thanks for any input.