ansaurus

Question

Application design for processing data prior to database

Answer 1

A:

A class for each rule? Really? Perhaps I'm not understanding the quantity or complexity of these rules, but I would (semi-pseudo-code):

public class ALine {
    private int col1;
    private int col2;
    private int coln;
    // ...

    public ALine(string line) {
         // read row into private variables
         // ...

         this.Process();
         this.Insert();
    }

    public void Process() {
         // do all your rules here working with the local variables
    }

    public void Insert() {
        // write to DB
    }
}

foreach line in csv
    new ALine(line);

Oli 2008-10-11 15:29:56

Answer 2

A:

Your methodology of using classes for each rule does sound a bit heavy weight but it has the advantage of being easy to modify and expand should new rules come along.

As for loading the data bulk loading is the way to go. I have read some informaiton which suggests it may be as much as 3 orders of magnitude faster than loading using insert statements. You can find some information on it here

stimms 2008-10-11 15:33:14

Answer 3

A:

Bulk load the data into a temp table, then use sql to apply your rules. use the temp table, as a basis for the insert into real table. drop the temp table.

EvilTeach 2008-10-11 15:35:59

Answer 4

+1 A:

I think your method is OK. Especially if you use the same interface on every processor.

You could also look to somethink called Drules, currently Jboss-rules. I used that some time ago for a rule-heavy part of my app and what I liked about it is that the business logic can be expressed in for instance a spreadsheet or DSL which then get's compiled to java (run-time and I think there's also a compile-time option). It makes rules a bit more succint and thus readable. It's also very easy to learn (2 days or so).

Here's a link to the opensource Jboss-rules. At jboss.com you can undoubtedly purchase an offically maintained version if that's more to your companies taste.

extraneon 2008-10-11 15:51:36

Answer 5

A:

hi oneBelizean,

you can see that all the different answers are coming from their own experience and perspective.

Since we don't know much about the complexity and number of rows in your system, we tend to give advice based on what we have done earlier.

If you want to narrow down to a 1/2 solutions for your implementation, try giving more details.

Good luck

anjanb 2008-10-11 17:53:20

Answer 6

+1 A:

If I didn't care to do this in 1 step (as Oli mentions), I'd probably use a pipe and filters design. Since your rules are relatively simple, I'd probably do a couple delegate based classes. For instance (C# code, but Java should be pretty similar...perhaps someone could translate?):

interface IFilter {
   public IEnumerable<string> Filter(IEnumerable<string> file) {
   }
}

class PredicateFilter : IFilter {
   public PredicateFilter(Predicate<string> predicate) { }

   public IEnumerable<string> Filter(IEnumerable<string> file) {
      foreach (string s in file) {
         if (this.Predicate(s)) {
            yield return s;
         }
      }
   }
}

class ActionFilter : IFilter {
  public ActionFilter(Action<string> action) { }

  public IEnumerable<string> Filter(IEnumerable<string> file) {
      foreach (string s in file) {
         this.Action(s);
         yield return s;
      }
  }
}

class ReplaceFilter : IFilter {
  public ReplaceFilter(Func<string, string> replace) { }

  public IEnumerable<string> Filter(IEnumerable<string> file) {
     foreach (string s in file) {
        yield return this.Replace(s);
     }
  }
}

From there, you could either use the delegate filters directly, or subclass them for the specifics. Then, register them with a Pipeline that will pass them through each filter.

Mark Brackett 2008-10-11 21:30:53

Answer 7

+1 A:

Just create a function to enforce each rule, and call every applicable function for each value. I don't see how this requires any exotic architecture.

Seun Osewa 2008-10-11 21:53:14

Answer 8

A:

It may not be what you want to hear, it isn't the "fun way" by any means, but there is a much easier way to do this.

So long as your data is evaluated line by line... you can setup another worksheet in your excel file and use spreadsheet style functions to do the necessary transforms, referencing the data from the raw data sheet. For more complex functions you can use the vba embedded in excel to write out custom operations.

I've used this approach many times and it works really well; its just not very sexy.

2008-10-13 16:06:11

ansaurus

tags:

views:

answers:

Application design for processing data prior to database

related questions