views:

375

answers:

10

I am often tasked with processing a list of tasks. Usually, this requires me to go fetch something programmatically and store it somewhere (database, file share, ftp, etc). Then I have to process the text file, xml, sql result set, or whatever it is, and store my results.

For example:

I have a database that details a list of a bunch of networks that may or may not be connected to my network. If not, I have to use Windows RAS to connect to a server on the network. Some of the networks may have more than a single computer. I have to pull a configuration file from every system (whether the network has one or many), and store that in a database. Then, I have to parse out specific information from those files and store that in the database. I also need to keep track of when a system has been checked and if there are any errors.

I have the database abstraction down, it is nice neat and organized. My database is ready for the data, all errors, checks for specific systems, errors, etc.

The rest of the application, however, is a different story. It always seems to be wrapped up in one loop in the main function, with some random classes (ie ConfigurationFile class, ConnectToNetwork class, etc) to hold functions that do tasks specific to my application ( really, my question is probably how to organize those classes).

Here's and example:

static void main()
{

  var networks = NetworkLocationsRepository.GetAll();

  foreach(var network in networks)
  {
    ConfigurationFile file = null;

    if(!network.IsConnected)
    {
      ConnectToNetwork(network);
    }

    foreach(var system in network.Systems)
    {
      file = GetConfigFile(system);
      ConfigurationFileRepository.Add(file);
      var configData = ParseConfiguration(file);
      ConfigurationFileRepository.UpdateData(file, configData);    
    }
  }
}

Edit

In response to the comments, I'm not so concerned with the loop (even though that's what my title focuses on). I know that because I have a collection of items to iterate, in some shape or form, I'm going to have a loop. Really, I'm more concerned with all of the stuff that gets put inside the loop. This example is fairly basic, but say I have 10 different tasks to complete on every system? Surely using a bunch of if statements to determine if tasks should be completed stuffed into a single foreach loop is not the best approach?

A: 

Have you consider Iterator pattern ? http://en.wikipedia.org/wiki/Iterator_pattern

calderas
The "foreach" requires an iterator, so he's using it twice, technically.
The Wicked Flea
Iterator pattern is in use in this example. Even twice. Iterator pattern is used in every foreach statement
brzozow
I don't think that's what I'm after.
scottm
+2  A: 

Use the command pattern.

Itay
+1  A: 

Nah, all programmers as loopy as your code. (/jk)

Seriously: no, not really. Unless you want to break the parts down into little pieces... they'll still run in a loop. Even a MVC application uses a loop, though it gets passed around between all the pieces each iteration. So even should you figure out a way to simplify your main loop, it'll still be some variant of a loop unless it is a single-pass.

(Even if you use a design pattern to "simplify" this, you'll still use a loop!)

The Wicked Flea
I know that essentially I will have to loop over the networks. I'm curious if there is a better way to handle what has to happen for each network, i.e. rather that if(network.Connected) connect(); I can just say do the task for this network. And, hopefully, through some pattern, whether or not I have to connect or if there are multiple systems will all be figured out in the contruction of the classes.
scottm
Yeah scotty there's a way to abstract out those lines. The process is as follows: highlight the code, right click, refactor, extract method.
dss539
@dss529, but then my question is "Where do I put those methods?". Should they just be a part of the Program class that contains the Main method, should there be just a static helper method class?
scottm
@scotty, it looks like you already have some helper methods like GetConfigFile(system). Where did you put those?
dss539
Also, why do you add the file to the repository, then parse it, then update the data? I'd have to see all of your code, but it seems like there's room for improvement. However, you code is excellent in general, and from what I can see your design is pretty good too.
dss539
@dss539, Well, it depends on the number of related helper methods. I may have a single class that has GetConfig, ParseConfig and ValidateConfigValues. But then for a one-off method, I might just stick in the Program class.
scottm
@dss539, I need to have both, the file as a whole and it's parts dissected and stored in database to be queried.
scottm
@scotty, then I would probably put the "parsed config data" object as a member of the "ConfigurationFile" class. Or make a new class that contained both. They are just two representations of the same data, right? Seems like it would be good to put them in the same class, but I guess this is straying from your initial topic.
dss539
"Even if you use a design pattern to "simplify" this, you'll still use a loop!)" -- the point of design patterns is to encapsulate implementation details, not eliminate them entirely ;) See my answer for an explanation.
Juliet
That was my point, Princess. When there are only 2 for-each statements a helper or a pattern isn't very necessary, because it's so straightforward. Adding a pattern (to the example) would just obfuscate the simple. Though, a large and complex application could benefit.
The Wicked Flea
+2  A: 

Nothing wrong with loops :)

But in any case, I'm not sure that it improves readability in your case, but its wholly possible to abstract away loops altogether (note: untested code, probably contains syntax errors):

public static void iter<T>(IEnumerable<T> items, Action<T> f)
{
    foreach(T item in item)
    {
        f(item);
    }
}

public static void Process()
{
    iter(NetworkLocationsRepository.GetAll(), network => {
        ConfigurationFile file = null;

        if(!network.IsConnected)
        {
            ConnectToNetwork(network);
        }

        iter(network, system => {
            file = GetConfigFile(system);
            ConfigurationFileRepository.Add(file);
            var configData = ParseConfiguration(file);
            ConfigurationFileRepository.UpdateData(file, configData);                
        });
    });
}

This is an interesting way to write code, because now that looping is abstracted out of the program, you can -- in principle -- implement looping anyway you want. For example, if you re-write the iter function as follows...

public static void iter<T>(IEnumerable<T> items, Action<T> f)
{
    foreach(T item in item)
    {
        Threading.QueueUserWorkItem(new WaitCallBack(() => f(item)));
    }
}

... then your code is multithreaded with no changes to your process method :)


[Edit to add]: Before someone says "hey, you used a loop", remember the entire point of design patterns is to abstract away implementation details and make them transparent to the client.

Let's say we weren't talking about loops, but instead talking about swtich statements where the OP said "I'm using switch statements everywhere, and I'm constantly switching on these three values, how do I fix this?" The answer is obviously to pull the implementation into a few classes which share a common interface and implement the logic required for each particular value; then you'd use a factory to return the implementation you want using a switch statement statement internally to choose the correct implementation.

Yes, the code certainly uses a switch, but that doesn't invalidate the design pattern. The whole point of the factory pattern is that the underlying switch statement totally transparent to the client.

Following from the same principle, my code abstracts away the loop and makes it totally transparent to the client. As shown above, its even possible to change the implementation of the iter function to make it more efficient without affecting any of the clients.

Juliet
But... you iterated it via a loop in a generic call! Generic or no, that's still "loopy".
The Wicked Flea
A: 

I think your program is too small (and straight forward) to apply a design pattern to it. Unless you were trying to accomplish something in addition to adding the design pattern... But then it depends on that addition...

Frank V
This is just an example. On the remote systems, there could be files I need to check, processes to start, files I need to upload, etc. So, my main() will have a ton of if statements if(network.NeedsFile) and if(network.ShouldStartProcess), or if(network.system["COMPUTER1"].NeedsFile), if(network.System["COMPUTER2"].NeedsFile). etc.
scottm
A: 

If you have 'n' items to process, a loop doesn't seem unreasonable.

However, two thoughts on the above. Both are major refactorings. The first is directly applicable, the second less so, but perhaps will give you some ideas - if not for now, then the future.

  1. you have 'n' items of work and you're processing them sequentially. As (I gather) these are largely network-bound, why not spawn off into multiple threads? At least you won't have to wait for 'n' items to complete in sequence.
  2. This is somewhat off the wall (or perhaps a bit off-topic, or perhaps a bit of feature creep). Space-based architectures rely on dumping tasks into a space, and then worker threads pull those tasks from the space, process them, and put the results back (in a simple case). Again, you're not looping and processing, but relying on 'm' worker threads to do this for you.

I come from the Java world and so I'd use executor factories and Javaspaces for 1 and 2 above. I confess I can't advise on specific c-sharp stuff, though.

Brian Agnew
That simply sounds like producer/consumer... why the fancy name?
dss539
For the space-based stuff ? It's asynchronous.The workers pick up when they're not busy, process, and then write back (often under a transaction). Secondly, work items are put into the space with various attributes, and workers will match work items based on attributes (e.g. trade type / currency etc.). There's no concept of ordering. Thirdly, the space is a standalone data store (often its own process), although you can easily emulate within process.
Brian Agnew
That sound like it's just a specific case of producer/consumer using a blackboard to pass task. However you are right that it does apply in this situation.
dss539
+1  A: 

If what you are doing to each "network" object is different, and is based on examining each object to figure out what to do, then there is a better way: make the objects be of different classes, with a common base class. Make the central loop only call the very general base class methods. Each derived class implements those methods differently.

This reduces the complexity of the decision making in the central loop, distributing it out to the various derived classes.

But this is just the essence of polymorphic programming, so it may not be telling you anything you don't already know.

Daniel Earwicker
+2  A: 

I like Ayende's pipes and filters. You'd end up defining all of your "loops" and chaining them.

Anthony Mastrean
I like this. In my head, I want a single path (pipe) for each item in the loop to take, and I want the path to determine (the filters) where the code goes. I think I might be able to get where I want with my own implementation of this pattern.
scottm
@scotty, it sounds like you're about to use a jackhammer when a chisel would do. Of course the jackhammer is more fun. ;)
dss539
This technique is wonderful, especially in that it avoids reading all of the data at once. However, be careful not to make data-base-reading pipes lazy. Your enumeration of a non-local resource should usually grab a big chunk or all of the resource at once and iterate over it in memory rather than grabbing one piece at a time. As an aside, it is often wonderfully easy to extend a pipe-based solution into a parallel pipe-based solution, merely by editting the piping function (as opposed to editting the individual filters and pipes).
Brian
@dss539, sometimes it's just about getting the practice.
scottm
A: 

I'll typically use an agent based approach in which a hosting application, e.g. a windows service, spawns separate threads each responsible for monitoring their own system. This way if one system is responding particularly slowly it won't hold up processing/monitoring of the other systems. We've used this approach for a variety of applications (monitoring, data acquisition, control systems, etc.). We usually head down this path when we have similar operations that need to be performed in a variety of systems or sources.

James Conigliaro
A: 

I think I don't understand your question very well. How many different kinds of sets of things are there? It could be that for each set of things you can register a task to be done for such things. The registration takes place in one or more tables that store tasks. Here are some ideas:

  • A network task is an object with a do_me method that takes one network as an argument.

  • A system task is an object with a do_me method that takes one system as an argument.

  • A file task is an object with a do_me method that takes one file as an argument.

Then you need some ways of composing tasks:

  • A for_all_systems method takes a network task N and returns a system task S. Task N is performed by iterating over each system in the network and performing task S on that system.

  • A do_on_config_file method takes a file task F and creates a system task S. Task S is performed on a system by performing task F on that system's configuration file.

Maybe these suggestions will spark off some ideas for solving your real problems in a way that is more compositional and exploits more reusable parts.

Norman Ramsey