Hi,
I'm usually pretty adept with algorithms, but I've got a pretty abstract question here, which is probably someone's PhD project somewhere, and bordering on NP completeness. But maybe it's a more common problem than I think.
I have a list of 25000 Strings, created using a bunch of drop down lists and text fields. So, to simplify the discussion, lets say this is the, er, unidirectional graph:
{My Cat/My Dog} had {kittens, puppies}.
So, this is like a tree structure whose 4 paths represent 4 possible sentences.
How would one reverse engineer the tree structure from a (possibly incomplete) list of sentences?
i.e.
So that from
My Cat had kittens
My Cat had puppies
My Dog had kittens, you could still recreate the original syntax tree.
Obviously with 25000 Strings, this will take a while. But is there any software out there that does this? Or, is this a common enough problem that there are known algorithms to do this?
It seems like a regex parser in nature, but I don't know where to begin. I'm dealing with this problem at work, and doing my own analysis of the sentences to parse another 500 or so, every time I find a new pattern. But I reckon if I had the tree structure, I could do it chop chop.
Any ideas? Thanks