views:

167

answers:

1

Goal

How to encode the data that describes how to re-order a static list from a one order to another order using the minimum amount of bytes possible?

Original Motivation

Originally this problem arose while working on a problem relaying sensor data using expensive satellite communication. A device had a list of about 1,000 sensors they were monitoring. The sensor list couldn't change. Each sensor had a unique id. All data was being logged internally for eventual analysis, the only thing that end users needed daily was which sensor fired in which order.

The entire project was scrapped, but this problem seems too interesting to ignore. Also previously I talked about "swaps" because I was thinking of the sorting algorithm, but really it's the overall order that's important, the steps required to arrive at that order probably wouldn't matter.

How the data was ordered

In SQL terms you could think of it like this.

**Initial Load**

create table sensor ( id int, last_detected datetime, other stuff )
-- fill table with ids of all sensors for this location

Day 0: Select ID from Sensor order by id
  (initially data is sorted by the sensor.id because its a known value)

Day 1: Select ID from Sensor order by last_detected
Day 2: Select ID from Sensor order by last_detected
Day 3: Select ID from Sensor order by last_detected

Assumptions

  • The starting list and ending list is composed of the exact same set of items
  • Each sensor has a unique id (32 bit integer)
  • The size of the list will be approximately 1,000 items
  • Each sensors may fire multiple times per minute or not at all for days
  • Only the change in ID sort order needs to be relayed.
  • Computation resources for figuring optimal solutions is cheap / unlimited
  • Data costs are expensive, roughly a dollar per kilobyte.
  • Data could only be sent as whole byte (octet) increments
  • The Day 0 order is known by the sender and receiver to start with
  • For now assume the system functions perfectly and no error checking is required

As I said the project/hardware is no more so this is now just an academic problem.

The Challenge!

Define an Encoder

  • Given A. Day N sort order
  • Given B. Day N + 1 sort order
  • Return C. a collection of bytes that describe how to convert A to B using the least number of bytes possible

Define a Decoder

  • Given A. Day N sort order
  • Given B. a collection of bytes
  • Return C. Day N + 1 sort order

Have fun everyone.

+1  A: 

As an academic problem, one approach would be to look at Algorithm P section 3.3.2 of Vol II of Knuth's the art of computer programming, which maps every permutation on N objects into an integer between 0 and N!-1. If every possible permutation is equally likely at any time, then the best you can do is to compute and transmit this (multi-precision) integer. In practice, giving each sensor a 10-bit number and then packing those 10 bit numbers up so you have e.g. 4 numbers packed into each chunk of 5 bytes would do almost as well.

Schemes based on diff or off the shelf compression make use of knowledge that not all permutations are equally likely. You may have knowledge of this based on the equipment, or you could see if this is case by looking at previous data. Fine if it works. In some cases with sensors and satellites you might want to worry about rare exceptions where you get worst case behaviour of your compression scheme and you suddenly have more data to transmit than you bargained for.

mcdowella
I agree especially with the first part of this. If you have a way to number each possible permutation (and generate the permutation from that number), all that needs to be transmitted is the number of the next permutation.
Justin Peel
Actually, you could just have the difference between the IDs of the permutations because that would be less information transmitted on average.. Also, if some permutations are much more likely, you could rearrange the ordering of the permutations to make those numbers closer together which, when combined with transmitting only the difference between the permutation IDs, would make less information transmitted. The key here is that there is already a lot of information transmitted by knowing beforehand that there are 1000 sensors that each have unique identification numbers.
Justin Peel