views:

190

answers:

3

Hi Guys.

I am looking at refactoring some very complex code which is a subsystem of a project I have at work. Part of my examination of this code is that it is incredibly complex, and contains a lot of inputs, intermediate values and outputs depending on some core business logic.

I want to redesign this code to be easier to maintain as well as executing a hell of a lot faster, so to start off with I have been trying to look at each of the parameters and their dependencies on each other. This has lead to quite a large and tangled graph and I would like a mechanism for simplifying this graph.

A while back I came across a technique in a book about SOA design called "Matrix Design Decomposition" which uses a matrix of outputs and the dependencies they have on the inputs, applies some form of matrix algebra and can generate Business Process diagrams for those dependencies.

I know there is a web tool available at http://www.designdecomposition.com/ however it is limited in the number of input/output dependencies you can have. I have tried looking around for the algorithmic source for this tool (so I could attempt to implement it myself without the size limitation), however I have had no luck.

Does anybody know a similar technique that I could use? Currently I am even considering taking the dependency matrix and applying some Genetic Algorithms to see if evolution can come up with a simpler workflow...

Cheers,

Aidos

EDIT:

I will explain the motivation:

The original code was written for a system which computed all of the values (about 60) every time the user performed an operation (adding, removing or modifying certain properties of a item). This code was written over ten years ago and is definitely showing signs of age - others have added more complex calculations into the system and now we are getting completely unreasonable performance (up to 2 minutes before control is returned to the user). It has been decided to detach the calculations from the user actions and provide a button to "recalculate" the values.

My problem arises because there are so many calculations that are going on and they are based on the assumption that all of the required data will be available for their computation - now when I try to re-implement the calculations I keep encountering problems because I haven't got the result for a different calculation that this calculation relies on.

This is where I want to use the matrix-decomposition approach. The MD approach allows me to specify all of the inputs and outputs and gives me the "simplest" workflow that I can use for generating all of the outputs.

I can then use this "workflow" to know the precedence of the calculations I need to perform to get the same result without generating any exceptions. It also shows me which parts of the calculation system I can parallelise and where the fork and join points will be (I won't worry about that part just yet). At the moment all I have is an insanely large matrix with lots of dependencies showing in it, with no idea where to start.

I will elaborate from my comment a little more:

I don't want to use the solution from the EA process in the actual program. I want to take the dependency matrix and decompose it into modules that I will then code manually - this is purely a design aid - I am just interested in what the inputs/outputs for these modules will be. Basically a representation of the complex interdependencies between these calculations, as well as some idea of precedence.

Say I have A requires B and C. D requires A and E. F requires B, A and E, I want to effectively partition the problem space from a complex set of dependencies into a "workflow" that I can examine to get a better understanding. Once I have this understanding I can come up with a better design / implementation that is still human readable, so for the example I know I need to calculate A, then C, then D, then F.

--

I know this seems kind of strange, if you take a look at the website I linked to before the matrix based decomposition there should give you some understanding of what I am thinking of...

A: 

If this is, as you say, "core business logic", then you really don't want to be screwing around with fancy decompositions and evolutionary algorithms that produce a "black box" solution that no one in the world understands or is capable of modifying. I would be very surprised if any of these techniques actually yielded any useful result; the human brain is still incomprehensibly more capable than any machine at untangling complicated relationships.

What you want to do is traditional refactoring: clean up the individual procedures, streamlining them and merging them where possible. Your goal is to make the code clear, so your successor doesn't have to go through the same process.

kquinn
I've elaborated the text a bit more to hopefully clarify what I am doing and why.
Aidos
+2  A: 

kquinn, If it's the piece of code I think he's referring to (I used to work there), it's already a black box solution that no human can understand as is. He's not looking to make it more complicated, less in fact. What he's trying to achieve is a whole heap of interlinked calculations.

What currently happens, is that whenever anything changes, it's an avalanche of events which cause a whole bunch of calculations to fire off, which in turn causes a whole bunch more events which continues on until finally it reaches a state of equilibrium.

What I assume he wants to do is find the dependencies for those outlying calculations and work in from there so they can be rewritten and find a way for the calculations from happening for the sake of it, rather than because they need to.

I can't offer much advice in regards to simplifying the graph, as unfortunately it's not something I have much experience in. That said, I would start looking for those outlying calculations which have no dependencies, and just traverse the graph from there. Start building up a new framework that includes the core business logic of each calculation in the simplest possible way, and refactor the crap out of it along the way.

It is definately the code you're working on. What I am trying to do is come up with a different approach to doing it. Without understanding what actually happens and in what order it is required to happen I am trying to push the proverbial up a hill.
Aidos
Thinking of, not working on....
Aidos
+1 because you understand what I am trying to deal with and hopefully can help elaborate on why I would even be trying to do this...
Aidos
A: 

What language are you using? Your problem should be pretty easy to model using Java Executors and Future<> tasks, but a similar framework is perhaps availabe on your chosen platform as well?

Also, if I understand this correctly, you want to generate a critical path for a large set of interdependent calculations -- is that something done dynamically, or do you "just" need a static analysis?

Regarding an algorithmic solution; pick up the closest copy of your numerical analysis textbook and refresh your memory on singular value decompositions and LU factorization; I'm guessing from the top off my head that this is what lies behind the tool you linked to.

EDIT: Since you're using Java, I'll give a brief outline of a suggestion proposal:

-> Use a threadpool executor to parallellize all calculations easily

-> Solve interdependencies with an object map of Future<> or FutureTask<>:s, i.e. if you variables are A, B and C, where A = B + C, do something like this:

static final Map<String, FutureTask<Integer>> mapping = ...
static final ThreadPoolExecutor threadpool = ...
FutureTask<Integer> a = new FutureTask<Integer>(new Callable<Integer>() {
            public Integer call() {
               Integer b = mapping.get("B").get();
               Integer c = mapping.get("C").get();
               return b + c;
            }
        }
    );
FutureTask<Integer> b = new FutureTask<Integer>(...);
FutureTask<Integer> c = new FutureTask<Integer>(...);
map.put("A", a);
map.put("B", a);
map.put("C", a);
for ( FutureTask<Integer> task : map.values() )
      threadpool.execute(task);

Now, if I'm not totally off (and I may very well be, it was a while since I worked in Java), you should be able to solve the apparent deadlock problem by tuning the thread pool size, or use a growing thread pool. (You still have to make sure that there are no interdependent tasks though, such as if A = B + C, and B = A + 1...)

Christoffer
I'm using javjav.I would love to parallelise it - but I think my problem is more fundamental than that. I don't even know what dependencies there are (I need to be able to identify the fork/merge points if I want to parallelise it).
Aidos
Assuming you can use a JDK more recent than 1.5, look into the java.util.concurrent package. You should not have to know too much about the system. I'll edit my answer to give a quick-and-dirty suggestion :)
Christoffer
Is certainly an interesting approach! I'll give it a try, I'm not sure it will work as one part of the problem I didn't describe is that this isn't a one off calculation. If you imagine a linked-list of these calculations, some are dependent on the previous and next elements in the linked-list - which compounds the problem even further.This approach should definately enable me to work out which ones I can do and in which order though!
Aidos