views:

119

answers:

6

Hi, we have a particle detector hard-wired to use 16-bit and 8-bit buffers. Every now and then, there are certain [predicted] peaks of particle fluxes passing through it; that's okay. What is not okay is that these fluxes usually reach magnitudes above the capacity of the buffers to store them; thus, overflows occur. On a chart, they look like the flux suddenly drops and begins growing again. Can you propose a [mostly] accurate method of detecting points of data suffering from an overflow?

P.S. The detector is physically inaccessible, so fixing it the 'right way' by replacing the buffers doesn't seem to be an option.

Update: Some clarifications as requested. We use python at the data processing facility; the technology used in the detector itself is pretty obscure (treat it as if it was developed by a completely unrelated third party), but it is definitely unsophisticated, i.e. not running a 'real' OS, just some low-level stuff to record the detector readings and to respond to remote commands like power cycle. Memory corruption and other problems are not an issue right now. The overflows occur simply because the designer of the detector used 16-bit buffers for counting the particle flux, and sometimes the flux exceeds 65535 particles per second.

Update 2: As several readers have pointed out, the intended solution would have something to do with analyzing the flux profile to detect sharp declines (e.g. by an order of magnitude) in an attempt to separate them from normal fluctuations. Another problem arises: can restorations (points where the original flux drops below the overflowing level) be detected by simply running the correction program against the reverted (by the x axis) flux profile?

A: 

Your question needs to provide more information about your implementation - what language/framework are you using?

Data overflows in software (which is what I think you're talking about) are bad practice and should be avoided. While you are seeing (strange data output) is only one side effect that is possible when experiencing data overflows, but it is merely the tip of the iceberg of the sorts of issues you can see.

You could quite easily experience more serious issues like memory corruption, which can cause programs to crash loudly, or worse, obscurely.

Is there any validation you can do to prevent the overflows from occurring in the first place?

LeopardSkinPillBoxHat
Added clarification in the question text.
David Parunakian
A: 

I really don't think you can fix it without fixing the underlying buffers. How are you supposed to tell the difference between the sequences of values (0, 1, 2, 1, 0) and (0, 1, 65538, 1, 0)? You can't.

Adam Rosenfield
Perhaps by calculating the derivative and looking for points where it differs from the predicted one? I'm not really sure how, that's why I'm asking, but extracting information characteristic to the function describing the flux profile appears to be the way to go.
David Parunakian
+1  A: 

Of course, ideally you'd fix the detector software to max out at 65535 to prevent wraparound of the sort that is causing your grief. I understand that this isn't always possible, or at least isn't always possible to do quickly.

When the particle flux exceeds 65535, does it do so quickly, or does the flux gradually increase and then gradually decrease? This makes a difference in what algorithm you might use to detect this. For example, if the flux goes up slowly enough:

true flux     measurement  
 5000           5000
10000          10000
30000          30000
50000          50000
70000           4465
90000          24465
60000          60000
30000          30000
10000          10000

then you'll tend to have a large negative drop at times when you have overflowed. A much larger negative drop than you'll have at any other time. This can serve as a signal that you've overflowed. To find the end of the overflow time period, you could look for a large jump to a value not too far from 65535.

All of this depends on the maximum true flux that is possible and on how rapidly the flux rises and falls. For example, is it possible to get more than 128k counts in one measurement period? Is it possible for one measurement to be 5000 and the next measurement to be 50000? If the data is not well-behaved enough, you may be able to make only statistical judgment about when you have overflowed.

Eddie
1) Yes, unfortunately there is a possibility of multiple consequent overflows.2) The general profile is predictable, i.e. when averaged by two or three neighboring points the profile becomes [nearly] smooth. However, there are lots of stochastic fluctuations in the original data, so there are multiple [sharp] declines here and there. Although looking for very large declines could help.However, this will not work if the original value is close to e.g. 128k from below, because it would look as a natural continuation of the flux profile when being cut by 64k.
David Parunakian
A: 

How about using an HMM where the hidden state is whether you are in an overflow and the emissions are observed particle flux?

The tricky part would be coming up with the probability models for the transitions (which will basically encode the time-scale of peaks) and for the emissions (which you can build if you know how the flux behaves and how overflow affects measurement). These are domain-specific questions, so there probably aren't ready-made solutions out there.

But one you have the model, everything else---fitting your data, quantifying uncertainty, simulation, etc.---is routine.

othercriteria
A: 

You can only do this if the actual jumps between successive values are much smaller than 65536. Otherwise, an overflow-induced valley artifact is indistinguishable from a real valley, you can only guess. You can try to match overflows to corresponding restorations, by simultaneously analysing a signal from the right and the left (assuming that there is a recognizable base line).

Other than that, all you can do is to adjust your experiment by repeating it with different original particle flows, so that real valleys will not move, but artifact ones move to the point of overflow.

Svante
+1  A: 
int32[] unwrap(int16[] x)
{
   // this is pseudocode
   int32[] y = new int32[x.length];
   y[0] = x[0];
   for (i = 1:x.length-1)
   {
      y[i] = y[i-1] + sign_extend(x[i]-x[i-1]);
      // works fine as long as the "real" value of x[i] and x[i-1]
      // differ by less than 1/2 of the span of allowable values
      // of x's storage type (=32768 in the case of int16)
      // Otherwise there is ambiguity.
   }
   return y;
}

int32 sign_extend(int16 x)
{
   return (int32)x; // works properly in Java and in most C compilers
}

// exercise for the reader to write similar code to unwrap 8-bit arrays
// to a 16-bit or 32-bit array
Jason S