views:

90

answers:

2

Suppose there is a tree, for argument's sake an XML tree. And you want a complete set of root to node paths, however you want to divide that set into groups of i, where i is user specified.

So for example a html document:

/html
/html/head 
/html/head/title    
/html/head/title/[text] 
/html/body
/html/body/[text]

becomes for example when i is 3:

{{1, 11, 111}, {1111, 12, 121}}

then becomes for example:

{3, 4}

Using a simplified tree class that can only get the node name; get an ArrayList of subtrees; and check if it is a leaf node; What is the best way to build this set of hashes?

EDIT: See my sample solution answer below, this is far from optimal as it is very slow and perhaps not even the best approach.

A: 

If I were doing this, my first thought would be a MultiMap (there are several implementations out there, or you could roll your own).

The key of this multimap would be the partial path used to reach the node, the value array would be the list (not Set, unless order isn't important -- and in XML it is) of nodes that share that partial path.

kdgregory
A: 

My own solution is as follows, although I am unsure if this is the most efficient way to achieve this... perhaps others could provide some insight into the intricacies of Java.

public ArrayList<Integer> makePathList(AbstractTree<String> tree){
 StringBuilder buffer = new StringBuilder();
 ArrayList<Integer> pl = new ArrayList<Integer>();
 ArrayList<StringBuilder> paths = getPaths(tree, buffer);
 for(StringBuilder sb : paths){
  pl.add(sb.toString().hashCode());
 }

 return pl;
}

public ArrayList<StringBuilder> getPaths(AbstractTree<String> tree, StringBuilder parent){

 ArrayList<StringBuilder> list = new ArrayList<StringBuilder>(); 
 parent.append("/");
 parent.append(tree.getNodeName());
 list.add(new StringBuilder(parent));
 if (!tree.isLeaf()){

  int i = 0;
  Iterator<AbstractTree<String>> child = tree.getChildren().iterator();
     while (i < tree.getChildren().size()){

      list.addAll(getPaths(child.next(), new StringBuilder(parent)));
      i++;
     }  
 }
 return list;
}

public HashSet<Integer> createShingleSet(ArrayList<Integer> paths, int shingleLength){
 HashSet<Integer> shingleSet = new HashSet<Integer>();
 for(int i = 0; i < paths.size(); i += shingleLength){
  Multiset<Integer> set = new Multiset<Integer>();
  for(int j = 0; j < shingleLength; j++){ 
   if (i + j < paths.size())
     set.add(paths.get(i + j)); 
  }
  shingleSet.add(set.hashCode()); 
 }
 return shingleSet;
}

EDIT: passing a StringBuilder is better for large files.

EDIT: for the same path to give the same hashcode, this seems to need to be applied afterwards

Robert