views:

83

answers:

3

How can I do dataflow (pipes and filters, stream processing, flow based) in C? And not with UNIX pipes.

I recently came across stream.py.

Streams are iterables with a pipelining mechanism to enable data-flow programming and easy parallelization.

The idea is to take the output of a function that turns an iterable into another iterable and plug that as the input of another such function. While you can already do this using function composition, this package provides an elegant notation for it by overloading the >> operator.

I would like to duplicate a simple version of this kind of functionality in C. I particularly like the overloading of the >> operator to avoid function composition mess. Wikipedia points to this hint from a Usenet post in 1990.

Why C? Because I would like to be able to do this on microcontrollers and in C extensions for other high level languages (Max, Pd*, Python).

* (ironic given that Max and Pd were written, in C, specifically for this purpose – I'm looking for something barebones)

+1  A: 

I'm not aware of any library for such purpose. Friend of mine implemented something similar in versity as a lab assignment. Main problems of such systems is low performance (really bad if functions in long pipe-lines are smallish) and potential need to implement scheduling (detecting dead-locks and boosting priority to avoid overload of pipe buffer).

From my experience with similar data processing, error handling is quite burdensome. Since functions in the pipeline know little of the context (intentionally, for reusability) they can't produce sensible error message. One can implement in-line error handling - passing errors down the pipe as data - but that would require special handling all over the place, especially on the output as it is not possible with streams to correlate to what input the error corresponds.

Considering known performance problems of the approach, it is hard for me to imagine how that would fit microcontrollers. Performance-wise, nothing beats a plain function: one can create a function for every path through the data pipe-line.

Probably you can look for some Petri net implementation (simulator or code generator), as they are one of the theoretical base for streams.

Dummy00001
+1  A: 

I know, that it's not a good answer, but you should make your own simple dataflow framework.

I've written a prototype DF server (together with a friend of mine), which have several unimplemented features yet: it can only pass Integer and Trigger data in messages, and it does not supports paralellism. I've just skipped this work: the components' producer ports have a list of function pointers to consumer ports, which are set up upon the initialization, and they call it (if the list is not empty). So, when an event fires, the components perform a tree-like walk-thru of the dataflow graph. As they work with Integers and Triggers, it's extremly quick.

Also, I've written a strange component, which have one consumer and one producer port, it just simply passes the data thru - but in another thread. It's consumer routine finishes quickly, as it just puts the data and sets a flag to the producer-side thread. Dirty, but it suits my needs: it detaches long processes of the tree-walk.

So, as you may recognized, it's a low-traffic asynchronous system for quick tasks, where the graph size does not matter.

Unfortunatelly, your problem differs as many points from mine, just as many one dataflow system can differ from another, you need a synchronous, paralell, stream handling solution.

I think, the biggest issue in a DF server is the dispatcher. Concurrency, collision, threads, priority... as I said, I've just skipped the problem, not solved. You should skip it, too. And you also should skip other problems.

Dispatcher

In case of a synchronous DF architecture, all the components must run once per cycle, except special cases. They have a simple precondition: is the input data available? So, you should just to scan thru the components, and pass them to a free caller thread, if data is available. After processing all of them, you will have N remaining components, which haven't processed. You should process the list again. After the second processing you will have M remainings. If N == M, the cycle is over.

I think some kind of same stuff will work, if the number of components is below only 100.

Binding

Yep, the best way of binding is the visual programming. Until finishing the editor, config-like code should used insetad, something like:

 // disclaimer: not actual code
 Component* c1 = new AddComponent();
 Component* c2 = new PrintComponent();
 c2->format = "The result is %d\n";
 bind(c1->result,c2->feed);

It's easy to write, well-readable, other wish?

Message

You should pass pure raw packets among components' ports. You need only a list of bindings, which contain pairs of pointers of producer and consumer ports, and contains the processed flag, which the "dispatcher" uses.

Calling issue

The problem is that producer should not call the consumer port, but the component; all component (class) variables and firings are in the component. So, the producer should call the component's common entry point directly, passing the consumer's ID to it, or it should call the port, which should call any method of the component which it belongs.


So, if you can live with some restrictions, I say, go ahead, and write your lite framework. It's a good task, but writing small components and see, how smart can they wired together building a great app is the ultimate fun.

If you have further questions, feel free to ask, I often scan the "dataflow" keyword here.

Possibly, you can figure out a more simple dataflowish model for your program.

ern0
I know its pseudo code, but 'new' is not a keyword in C.
Tim Post
The DF approach requires OOP, at least at modeling phase: Component, Port and Message are naturally identified as classes. Also, implementation can be done w/o OOP, but not recommended (except, if it is a constraint). Also, my prototype server is written in C++. Yep, I got it! If you use OOP, it's easiest to implement the feature that Components may have state. I confess: many of my Components have state. Even worse: they can reload their state at startup, so restarting the server causes no data loss (only service break). I believe, DF approach is not a subtype of functional programming.
ern0
A: 

This is cool: http://code.google.com/p/libconcurrency/

A lightweight concurrency library for C, featuring symmetric coroutines as the main control flow abstraction. The library is similar to State Threads, but using coroutines instead of green threads. This simplifies inter-procedural calls and largely eliminates the need for mutexes and semaphores for signaling.

Eventually, coroutine calls will also be able to safely migrate between kernel threads, so the achievable scalability is consequently much higher than State Threads, which is purposely single-threaded.

This library was inspired by Douglas W. Jones' "minimal user-level thread package". The pseudo-platform-neutral probing algorithm on the svn trunk is derived from his code.

There is also a safer, more portable coroutine implementation based on stack copying, which was inspired by sigfpe's page on portable continuations in C. Copying is more portable and flexible than stack switching, and making copying competitive with switching is being researched.

msutherl