views:

45

answers:

0

hi,

I have a large dataset with name and values. I want to categorize all these values into a meaningful category: eg: 25% names with certain range of values fall in category 1 50% names with certain range of values fall in category 2. Tried using percentile calculation: but this ends up giving me inconsistent categorization. I was looking at using logarithmic calculation. But no idea where I could start. What if you do not know exact number - as it is different for every day. so doesnt the categorization fail? I will give an example which i am working on:

I am trying to categorize my ports with all the data volume it is processing. So I have the following sample:

Port – Day 1    Data Input Volume   Data Output Volume
1               10000             2999
2               0                     19990
3               10000             2345
5               56789              234
6                  0                    0
7                  0                    0
8              1000                    1569
Port – Day 2    Data Input Volume   Data Output Volume
1               10                      0
2               100                     90
3              10000                    2345
5               567                       890

So I wanted to create categories where the categories firstly remain consistent throughout all the days. And do not keep changing if the data volumne is more or less for a particular day. Which is what happens using percentile function.

Any suggestions or idea for evaluating this kind of situation?

Thank you,