So you want a hashcode calculation which gives equal results for "picture" and "turepic", but (preferably) different from the hashcode of e.g. "eruptic". Thus it is not enough to simply add up the hashcodes of the letters contained in the word - you need to have some position information too, but still, it should be independent of the actual permutation of the word. You need to define "equivalence classes", and always calculate the same hashcode for each member of the class.
The easiest way to achieve this is to select a specific member of the equivalence class and always use the hashcode of that variation for all equivalent words. E.g. select the first variant alphabetically (thanks @Michael for summing it up concisely). For "picture" et al., that would be "cturepi". Both "picture" and "turepic" (and all the other equivalent variations) should return the hash code of "cturepi". That hash code could be calculated by the standard LinkedList method, or any other preferred way.
One might say that this calculation is very expensive. True, however one could cache the result, so that only the first calculation would be costly. And I guess the selection of the first alphabetical variant could be optimized fairly much in the common case (compared to the trivial solution of generating all permutations in the specific equivalence class, then sorting them and picking the first).
E.g. in many of the words, the first letter alphabetically is unique ("picture" is one of these - its first letter alphabetically is 'c', and there is only one 'c' in it). So you only need to find it, then calculate the hashcode starting from there. If it is not unique, you need to compare the second, third, etc. letters after that, until you find a difference (or you roll over).
Update 2 - examples
- "abracadabra" contains 5 'a's. The 2nd characters after the 'a's are 'b', 'c', 'd', 'b' and 'a', respectively. So in the 2nd round of comparison you can conclude that the lexicographically smallest variation is "aabracadabr".
- "abab" contains 2 'a's, and a 'b' after each (and then you roll over, reaching an 'a' again, so the quest ends there). So you have two identical lexicographically smallest variations of it. But since they are identical, they obviously produce the same hashcode.
Update: In the end, it all boils down to how much do you actually need the hashcode - i.e. do you plan to put your circular lists into an associative collection like Set
or Map
. If not, you can do with a simple, or even trivial hash method. But if you use some associative collection heavily, a trivial hash implementation gives you lots of collisions thus suboptimal performance. In this case, it is worth a try implementing this hash method and measuring whether it pays for itself in performance.
Update 3: sample code
Letter
is basically left the same as above, I only made the fields private
, renamed theNextNode
to next
, and added getters/setters as needed.
In CircularWord
I made some more changes: dropped tail
and theCurrentNode
, and made the word really circular (i.e. last.next == head
). The constructor, toString
and equals
are not relevant for computing the hashcode, so they are omitted for the sake of simplicity.
public class CircularWord {
private final Letter head;
private final int numberOfElements;
// constructor, toString(), equals() omitted
@Override
public int hashCode() {
return hashCodeStartingFrom(getStartOfSmallestRotation());
}
private Letter getStartOfSmallestRotation() {
if (head == null) {
return null;
}
Set<Letter> candidates = allLetters();
int counter = numberOfElements;
while (candidates.size() > 1 && counter > 0) {
candidates = selectSmallestSuccessors(candidates);
counter--;
}
return rollOverToStart(counter, candidates.iterator().next());
}
private Set<Letter> allLetters() {
Set<Letter> letters = new LinkedHashSet<Letter>();
Letter letter = head;
for (int i = 0; i < numberOfElements; i++) {
letters.add(letter);
letter = letter.getNext();
}
return letters;
}
private Set<Letter> selectSmallestSuccessors(Set<Letter> candidates) {
Set<Letter> smallestSuccessors = new LinkedHashSet<Letter>();
char min = Character.MAX_VALUE;
for (Letter letter : candidates) {
Letter nextLetter = letter.getNext();
if (nextLetter.getValue() < min) {
min = nextLetter.getValue();
smallestSuccessors.clear();
}
if (nextLetter.getValue() == min) {
smallestSuccessors.add(nextLetter);
}
}
return smallestSuccessors;
}
private Letter rollOverToStart(int counter, Letter lastCandidate) {
for (; counter >= 0; counter--) {
lastCandidate = lastCandidate.getNext();
}
return lastCandidate;
}
private int hashCodeStartingFrom(Letter startFrom) {
int hash = 0;
Letter letter = startFrom;
for (int i = 0; i < numberOfElements; i++) {
hash = 31 * hash + letter.getValue();
letter = letter.getNext();
}
return hash;
}
}
The algorithm implemented in getStartOfSmallestRotation
to find the lexicographically smallest rotation of the word is basically what I describe above: compare and select the lexicographically smallest 1st, 2nd, 3rd etc. letters of each rotation, dropping the greater letters until either there is only one candidate left, or you roll over the word. Since the list is circular, I use a counter to avoid an infinite loop.
In the end, if I have a single candidate left, it may be in the middle of the word and I need to get the start of the smallest word rotation. However, as this is a singly-linked list, it is awkward to step backwards in it. Luckily, the counter nicely helps me out: it has recorded how many letters I have compared so far, but in a circular list this is equivalent to how many letters I can move forward before rolling over. Thus I know how many letters to move forward in order to get again to the beginning of the minimal word rotation I am looking for.
Hope this helps someone - at least it was fun to write :-)