ansaurus

Question

How to choose group of numbers in the vector

Answer 1

+2 A:

It sounds like you're wanting to select the n largest probabilities but the number n is flexible. If n were fixed, say n=10, you could just sort your vector and pull out the top 10 items. But from your example it sounds like you'd like to use a smaller value of n if there's a natural break in the data. Maybe you want to start with the largest probability and go down the list selecting items until the sum of the probabilities you pick crosses some threshold.

Maybe you have an implicit optimization problem where you want to maximize some probability with some penalty for large n. Try stating your problem that way. You might find your own answer, or you might be able to rephrase your question here in a way that helps other people give you a better answer.

John D. Cook 2009-02-25 14:02:30

Sorry for that: half of the post is missing. I am going to clarify that.

Jiri 2009-02-25 14:11:47

Answer 2

+3 A:

John's answer is good. Also you might try

sort the probabilities
find the largest gap between successive probabilities
work up from there

From there, it's starting to sound like a pattern-recognition problem.
My favorite method is markov-chain-monte-carlo(MCMC).

Edit: Since you clarified your question, my first thought is, since you only have 8 possible answers, develop a score for each one, based on how much probability it contains and whether or not it splits at a gap, and make a heuristic judgement.

Further edit: This sounds a bit like logistic regression. You want to find a value of P that effectively divides your set into members and non-members. For a given value of P, you can compute a log-likelihood for the ensemble, and choose P that maximizes that.

Mike Dunlavey 2009-02-25 14:10:49

I have them sorted already. I have tried to find a gap, but 'size' of the gap varies as count of the numbers grows and also as 'size' of the largest group grows. Example: in the largest group of 3 numbers, they will have value around 0.3, but 8 large numbers will have value around 1/8.

Jiri 2009-02-25 14:18:52

Couldn't you just go down the list, pick the largest gap, and use that?

Mike Dunlavey 2009-02-25 14:24:08

"Example: in the largest group of 3 numbers, they will have value around 0.3, but 8 large numbers will have value around 1/8." Ok - so _what_ do you want then? I don't think your question can be solved rationally.

Daniel Daranas 2009-02-25 14:30:38

I have just tried to implement 'largest gap'. It has passed all my tests and it is much simpler than my own solution. Thanks.

Jiri 2009-02-25 14:39:14

You're welcome. (Aside: we have a house guest for the last year, from the Czech Republic, named Jiri, so I even know how to pronounce it [not that I do it correctly].)

Mike Dunlavey 2009-02-25 14:47:06

Answer 3

+1 A:

I'm not really sure if this is what you want, but it seems you want to do the following.

Lets assume that the probabilities are x_1,...,x_N in increasing order. Then you should try to find 1<= i < j <= N such that the function

f(i,j)  =  (x_i + x_(i+1) + ... + x_j)/(x_j - x_i)

is maximized. This can be done naively in quadratic time.

Il-Bhima 2009-02-25 17:38:54

Interesting. I've got to think about that.

Mike Dunlavey 2009-02-25 22:26:25

ansaurus

tags:

views:

answers:

How to choose group of numbers in the vector

related questions