Given the sequence pf numbers N
1, N
2, N
3...
from some source, not a PRNG but say sensor or logging data of some kind, is it safe to assume that processing it like this
N
n
/ B = Q
nRem M
n
will result in the sequence Q
haveing less entropy than the sequence M
?
Note: assume that B
is such that both Q
and M
has the same sized range.
This is related to the observation that most real world data sets, regardless or there source, have a logarithmic distribution; numbers starting in 1 are much more common than numbers starting in 9. But this says little about the low order parts.
for a fun way to test this (and piss off you sys admin by bogging down his computer) run this in bash:
ll -R 2>/dev/null | grep -v -e "^\./" | sed "s/[-rdwxlp]*\W*[0-9]*\W*[a-z]*\W*[a-z]*\W*\([0-9]\).*/\1/" | sort | uniq -c
and get the histogram of the first digit of files sizes.