ansaurus

Question

Efficient Algorithm for comparing only the lists that have been updated

Answer 1

A:

In general, you won't be able to come up with a structure that is as-fast-as-possible for every operation. There are tradeoffs to be made.

This problem looks very similar to that of executing queries on a relational database - SELECT * WHERE .... You might consider looking there for inspiration.

Anon. 2010-02-04 02:32:26

Answer 2

A:

I'm not sure I understand what you are doing completely (the purpose of ArrayDefinition is particularly hazy), but I think you should consider separating the modeling of the objects from their relationships. In other words, create a separate mapping from object to object for each relationship. If objects are represented by their integer index, you need only find an efficient way to represent integer to integer mappings.

ergosys 2010-02-04 03:34:21

Answer 3

A:

I had a sleep and when I woke up, I had a new idea. It might work...

If each "thing" keeps a list of all the "array definitions" it is used to define.

class Thing {
    char* Name;
    HashTable<Thing*, int> Relationships;
    ArrayDefinition* ArrayDef;
    Set<ArrayDefinition*> UsedInTheseDefs;
}

class ArrayDefinition {
    Array<Thing> Items;
    Set<int> RelationModifiedTag;
}

And I keep a global list of all the "comparable array pairs".

And I also construct a global list, of all the "arrays that can be compared" (not in pairs, just one by one).

Then, everytime a relationship is changed, I can go over the list of "arrays definitions" I'm inside of, and add a little "tag" to it :)

So I can do something like this:

static CurrRel = 0;
CurrRel++; // the actual number doesn't matter, it's just used for matching

foreach(Arr in this->UsedInTheseDefs) {
    Arr->RelationModifiedTag.Add( CurrRel );
}
foreach(Arr in other->UsedInTheseDefs) {
    Arr->RelationModifiedTag.Add( CurrRel );
}

I altered both sides of the relationship. So if I did this: "A outside B", then I've added a "modifiedtag" to all the arrays A is used to define, and all the arrays B is used to define.

So, then I loop over my list of "comparable array-pairs". Each pair of course is two arrays, each one will have a "RelationModifiedTag" set.

So I check both RelationModifiedTag sets against each other, to see if they have any matching numbers. If they DO, then this means this array pair has a relationship that's just been altered! So... I can do my array comparison then.

It should work :)

It does require a bit of overhead, but the main thing is I think it scales well to larger data sets. For smaller datasets say only 10 arrays, a simpler more brute force approach could be used, just compare all array-pairs that don't have fully known relationship, and don't bother to keep track of what relationships have been altered.

There's further optimisations possible. But I won't go into those here, because it just distracts from the main algorithm, and they are kind of obvious. For example if I have two sets to compare, I should loop over the smaller set and check against the bigger set.

Apologies for having to read all this long text. And thanks for all the attempts to help.

boytheo 2010-02-04 15:27:29

Answer 4

+1 A:

Well, first of all, some vocabulary.

Design Pattern: Observer

It's a design pattern that allow objects to register themselves into others, and ask for notifications on events.

For example, each ThingWithArray could register itself in the Thing they managed, so that if the Thing is updated the ThingWithArray will get notified back.

Now, there is usually an unsubscribe method, meaning that as soon as the ThingWithArray no longer depends on some Thing because all the relations that use them have been used, then they could unsubscribe themselves, so as not to be notified of the changes any longer.

This way you only notify those which actually care about the update.

There is one point to take into account though: if you have recursive relationships, it might get hairy, and you'll need to come up with a way to avoid this.

Also, follow ergosys advise, and model relationships outside of the objects. Having 1 BIG class is usually the start of troubles... if you have difficulty cutting it into logical parts, it's that the problem is not clear for you, and you should ask help on how to model it... Once you've got a clear model, things usually fall into place a bit more easily.

Matthieu M. 2010-02-04 19:06:07

Matthieu, thanks for the "Observer" design pattern. That seems a useful term to know, to find more interesting uses and cases of the Observer design pattern.I disagree that I need to store the relationships outside of the "Thing" itself. I actually see this as making matters more complicated. I agree that a clear simple model helps.But that's not really important. Whats important is that you made me aware of the Observer design pattern. So thanks.

boytheo 2010-02-04 19:33:56

Actually, you may not have noticed it, but relationships are symetric. At the moment, if `X <> Y` changes, you have to update both `X` and `Y` objects... 2 objects to update means 1 chance to forget.

Matthieu M. 2010-02-05 07:26:25

Answer 5

A:

From your own answer I deduce that the unknown relations are greatly over-numbered by known relationships. You could then keep track of the unknown relationships of each thing in a separate hash table/set. As a further optimization, instead of keeping track of all definitions that a thing is used in, keep track of which of these definitions have unknown relationships - which relationships can be affected. Then given a newly defined relationship between X and Y, take the affected definitions of one of them, and find the intersection of each of the unknown relations with the affected definitions of the other one. To keep the acceleration datastructure up to date, when a relationships becomes known, remove it from the unknown relationships and if no unknown relationships remain go over the array def and remove the thing from can-affect sets.

The datastructure would then look something like this:

class Thing {
    char* Name;
    HashTable<Thing*, int> Relationships;
    Set<Thing*> UnknownRelationships;
    ArrayDefinition* ArrayDef;
    Set<Thing*> CanAffect; // Thing where this in ArrayDefinition and UnknownRelationships not empty
}

class ArrayDefinition {
    Array<Thing> Items;
}

Ants Aasma 2010-02-04 22:53:40

ansaurus

tags:

views:

answers:

Efficient Algorithm for comparing only the lists that have been updated

related questions