Hi everyone,
Even describing this problem is hard, But I'll give it a go. I've been struggling with this for a few days and decided to ask here.
OK, so I'm trying to model "concepts", or "things" as I call them. Just concepts in general. It's to do with processing logic.
So, each "thing" is defined by it's relationship to other things. I store this as a set of 5 bits per relationship. A 'thing' could be like this:
class Thing {
char* Name;
HashTable<Thing*, int> Relationships;
}
So, I model "Things" like that. 5 bits per relationship. Each bit stands for one possible relationship. Like this: 1 equals, 2 inside, 3 outside, 4 contains, 5 overlaps. Having all 5 bits on means we totally don't know what the relationship is. Having 2 bits means we think the relationship could be one of two possibilities. Relationships start off as "unknown" (all 5 bits are true) and get more specific as time goes on.
So this is how I model increasing knowledge over time. Things start off in a fully unknown state, and can pass through partially known states, and can reach fully known states.
A little more background:
I try to add extra definition to my modelling of "concepts" (things), by using extra classes. Like this:
class ArrayDefinition {
Array<Thing> Items;
}
And my Thing class becomes like this:
class Thing {
char* Name;
HashTable<Thing*, int> Relationships;
ArrayDefinition* ArrayDef;
}
This "ArrayDef" doesn't HAVE to be used. It's just there to be used, if needed. Some "things" don't have arrays, some do. But all "things" have relationships.
I can process this "ArrayDefinition" to figure out the relationship between two things! For example, if X = [ A B C D E ]
and Y = [ C D E ]
, my code can process these two arrays, and figure out that "Y inside X
".
OK, so that's enough background. I've explained the core problem, avoiding my real code which has all sorts of distracting details.
Here's the problem:
The problem is making this not run ridiculously slow.
Imagine, there are 2000 "things". Let's say 1000 of these have array definitions. Now, that makes 500,000(ish) possible "array-pairs" that we need to compare against each other.
I hope I'm starting to finally make sense now. How to avoid processing them all against each other? I've already realised that if two "things" have a fully known relationship, there is no point in comparing their "array definitions", because that's just used to figure out extra detail on the relationship, but we have the exact relationship, so there's no point.
So... let's say only 500 of these "things with arrays" have unknown or partially known relationships. That still makes 250000(ish) possible "array-pairs" to compare!
Now... to me, the most obvious place to start, is realising that unless a relationship used to define two arrays changes (Becomes more specific), then there is no point processing this "array-pair".
For example, let's say I have these two arrays:
X = [ A B C D E ]
Y = [ Q W R T ]
now, if I say that T=R
, that's very nice. But this does not affect the relationship between X and Y. So just because T's relationship to R has become known as "equal", whereas before it might have been fully unknown, this does not mean I need to compare X and Y again.
On the other hand, if I say "T outside E
", this is a relationship between things used to define the two arrays. So saying that "T outside E
" means I need to process X's array against Y's array.
I really don't want to have to compare 500,000 "array-pairs" just to process logic on 1000 arrays when almost nothing has changed between them!
So... my first attempt at simplifying this, was to keep a list of all the arrays that a thing is used to define.
Let's say I have 3 arrays:
A = [ X Y Z ]
B = [ X X X X ]
C = [ X Z X F ]
Well, X is used in 3 arrays. So, X could keep a list of all the arrays it is used inside of.
So, if I said "X inside Y"
, this could bring up a list of all the arrays that Y is used to define, and all the arrays X is used to define. Let's say X is used in 3 arrays, and Y is used in 1 array. From this, we can figure out that there are 2 "array-pairs" we need to compare (A vs B, and A vs C).
We can futher trim this list by checking if any of the array pairs already have fully known relationships.
The problem I have with this, is it STILL seems excessive.
Let's say X is a really common "thing". It's used in 10,000 arrays. And Y is a really common thing, used in 10,000 arrays.
I still end up with 100,000,000 array-pairs to compare. OK, so let's say I don't need to compare them all, actually, only 50 of them were partially known or totally unknown.
But... I still had to run over a list of 100,000,000 array-pairs, to figure out which of these was partially known. So it's still inefficient.
I'm really wondering if there is no efficient method of doing this. And if really all I can do is make a few effective "heuristicish" strategies. I haven't had too much luck coming up with good strategies yet.
I realise that this problem is highly specialised. And I realise that reading this long post may take too long. I'm just not sure how to shrink the post length or describe this in terms of more common problems.
If it helps... My best attempt to express this in common terms, is "how to compare only the lists that have been updated".
Anyone got any ideas? That would be great. If not... perhaps just me writing this out may help my thinking process.
The thing is, I just can't help but feel that there is some algorithm or approach that can make this problem run fast and efficient. I just don't know what that algorithm is.
Thanks all