I'm trying to debug a hashset implementation (for a school assignment). Collisions are being managed via linear probing, and I need to count the number of clusters of a given size for the debug routine. I worked this up:
// really just hoping that a 50+ cluster doesn't occur
int clusters[50];
int count = 0;
for (int i=0; i < hashset->dim; i++) {
if (hashset->array[i] != NULL) {
count++;
} else {
if (count == 0) continue;
if (clusters[count] == NULL) clusters[count] = 0;
clusters[count]++;
count = 0;
}
}
for (int i=1; i < 50; i++) {
if (clusters[i] != NULL && clusters[i] != 0)
printf("%d clusters of size %d\n", clusters[i], i);
}
Seems to make sense, but when I run it I get...
25143 entries in hashset
50286 dimension of the hash array
4585 clusters of size 1
2134 clusters of size 2
1102 clusters of size 3
696 clusters of size 4
388 clusters of size 5
264 clusters of size 6
173 clusters of size 7
104 clusters of size 8
89 clusters of size 9
51 clusters of size 10
46 clusters of size 11
35 clusters of size 12
26 clusters of size 13
22 clusters of size 14
17 clusters of size 15
134553327 clusters of size 16
134634407 clusters of size 17
112 clusters of size 18
6 clusters of size 19
134553324 clusters of size 20
134634399 clusters of size 21
107 clusters of size 22
3 clusters of size 23
2 clusters of size 24
134634401 clusters of size 25
107 clusters of size 26
134107784 clusters of size 27
134556210 clusters of size 28
[... more nonsense]
So initially it seems to give sane output.. lots of clusters, but whatever. My thinking is that the way-too-huge numbers should actually be 0 - they're clusters which don't actually exist but for some reason are still getting printed. I just have no idea why...