hi,
I have a large dataset with name and values. I want to categorize all these values into a meaningful category: eg: 25% names with certain range of values fall in category 1 50% names with certain range of values fall in category 2. Tried using percentile calculation: but this ends up giving me inconsistent categorization. I was looking at using logarithmic calculation. But no idea where I could start. What if you do not know exact number - as it is different for every day. so doesnt the categorization fail? I will give an example which i am working on:
I am trying to categorize my ports with all the data volume it is processing. So I have the following sample:
Port – Day 1 Data Input Volume Data Output Volume
1 10000 2999
2 0 19990
3 10000 2345
5 56789 234
6 0 0
7 0 0
8 1000 1569
Port – Day 2 Data Input Volume Data Output Volume
1 10 0
2 100 90
3 10000 2345
5 567 890
So I wanted to create categories where the categories firstly remain consistent throughout all the days. And do not keep changing if the data volumne is more or less for a particular day. Which is what happens using percentile function.
Any suggestions or idea for evaluating this kind of situation?
Thank you,