views:

108

answers:

1

Given the sequence pf numbers N1, N2, N3... from some source, not a PRNG but say sensor or logging data of some kind, is it safe to assume that processing it like this

Nn / B = Qn  Rem Mn

will result in the sequence Q haveing less entropy than the sequence M?

Note: assume that B is such that both Q and M has the same sized range.


This is related to the observation that most real world data sets, regardless or there source, have a logarithmic distribution; numbers starting in 1 are much more common than numbers starting in 9. But this says little about the low order parts.

for a fun way to test this (and piss off you sys admin by bogging down his computer) run this in bash:

 ll -R 2>/dev/null | grep -v -e "^\./" | sed "s/[-rdwxlp]*\W*[0-9]*\W*[a-z]*\W*[a-z]*\W*\([0-9]\).*/\1/" | sort | uniq -c

and get the histogram of the first digit of files sizes.

+1  A: 

This depends on the sequence. For example, take [1 * 7 = 7, 3 * 7 = 21, 6 * 7 = 42 ... (2 * N - 1) * 7] and B = 7. Qn will be [1, 3, 6, ... 2 * N - 1] and Mn will be 0 always. Usually, entropy for Q will be less as it's like shifting some bits off, but it's not always like this.

And of course this won't work especially for data coming from a (P)RNG, as the range for Qn will be the same as the range for Mn and for both, numbers are (almost) equally distributed.

schnaader
IIRC for some PRNGs Q will have less.
BCS
Just for bad PRNGs, and the entropy difference will be minimal except if you take some really bad ones like the MSVC one where the lower bits are "less random" than the upper bits.
schnaader
Don't get me wrong, usual sensor data (like temperature) of course will tend to change only in the lower bits, so Qn will have lower entropy. But my point is that this won't be true for all kind of data.
schnaader