Is there a good algorithm I might apply to a DOM to lead me to groups of probably related nodes? The ultimate goal is to get something useful to assist extracting things like TOC's and "blog rolls" from websites. If something like this already exists, I'd be happy if someone let me know that as well.
I realize it's not something I can hope to do deterministically. The reason I suspect there might be a solution out there already comes from recently stepping through the 'diff algorithm' which deals with common sequences. I'm not sure if it's a leap or not to go from 'common' to 'repeating'...