tags:

views:

118

answers:

2

In MATLAB:

n = histc(x,edges);

is defined to behave as follows:

n(k) counts the value x(i) if edges(k) <= x(i) < edges(k+1). The last bin counts any values of x that match edges(end).

Is there any way to flip the end behavior such that n(1) counts any values of x that match edges(1), and n(end) counts the values x(i) that satisfy edges(end-1) <= x(i) < edges(end)?

A: 

Since the edges argument has to have monotonically nondecreasing values, one way to flip the edge behavior is to negate and flip the edges argument and negate the values for binning. If you then flip the bin count output from HISTC, you should see the typical edge behavior of HISTC reversed:

n = fliplr(histc(-x,-fliplr(edges)));

The above uses FLIPLR, so x and edges should be row vectors (i.e. 1-by-N). This code will bin data according to the following criteria:

  • The first bin n(1) counts any values of x that match edges(1).
  • The other bins n(k) count the values x(i) such that edges(k-1) < x(i) <= edges(k).

Note that this flips the edge behavior of all the bins, not just the first and last bins! The typical behavior of HISTC for bin n(k) uses the equation edges(k) <= x(i) < edges(k+1) (Note the difference between the indices and which side has the equals sign!).


EDIT: After some discussion...

If you instead wanted to bin data according to the following criteria:

  • The first bin n(1) counts any values of x that match edges(1).
  • The second bin n(2) counts the values x(i) such that edges(1) < x(i) < edges(2).
  • The other bins n(k) count the values x(i) such that edges(k-1) <= x(i) < edges(k).

Then the following should accomplish this:

n = histc(x,[edges(1) edges(1)+eps(edges(1)) edges(2:end)]);
n(end) = [];

The first bin should capture only values equal to edges(1), while the lower edge of the second bin should start at an incremental value above edges(1) (found using the EPS function). The last bin, which counts the number of values equal to edges(end), is thrown out.

gnovice
thank you both, much appreciated!
alian
+1  A: 

Consider the following code:

n = histc(x, [edges(1) edges]);
n(1) = sum(x==edges(1));
n(end) = [];

According to the question posted, the above will return:

  • n(1): counts any values of x that match edges(1)
  • n(k) [k~=1]: counts the value x(i) if edges(k-1) <= x(i) < edges(k)

This different from gnovice solution in that his answer uses the bounds: edges(k-1) < x(i) <= edges(k) (note the position of the equality sign).


To demonstrate, consider this simple example:

x = [0 1 1.5 2 2.5 4 6.5 8 10];
edges = 0:2:10;

>> n = fliplr(histc(-x,-fliplr(edges)))
n =
     1     3     2     0     2     1

corresponding to the intervals: 0 (0,2] (2,4] (4,6] (6,8] (8,10]

Against:

>> n = histc(x, [edges(1) edges]);
>> n(1) = sum(x==edges(1));
>> n(end) = []
n =
     1     3     2     1     1     1

corresponding to the intervals: 0 [0,2) [2,4) [4,6) [6,8) [8,10)

Amro
@Amro: There's an error in your binning if you go through by hand. `n(1)` bins the value `0`, while `n(2)` bins the values '0, 1, 1.5`, and `n(end)` bins the value `8`. Notice that the zeroes get counted in two bins, and the 10 never gets counted. If you add more zeroes to `x`, you'll see the bin counts are off.
gnovice
I am aware of that... note that I am following exactly what the OP described: that bin k is bounded by edge(k-1) inclusive and edge(k) exclusive.
Amro
in the end, it depends on where you intend the equality sign to be (left or right)
Amro
@Amro: While it's true that that appears to be what the OP originally *asked for*, I'm guessing that it may have been an oversight on their part. They may not have considered that their definition in the question may cause some values to be counted twice in the first two bins.
gnovice
gnovice is correct, i was sloppy in phrasing my question, for which i apologize, first time on this board...
alian

related questions