Dear all,
I am currently strugeling with a machine learning problem whereas I have to deal with great unbalanced data sets. That is, there are six classes ('1','2'...'6'). Unfortunately there are e.g. for class '1' 150 examples/instances, for '2' 90 instances and for class '3' only 20. All other classes can't be "trained" since there are no available instances for these classes.
So far, I figured out that WEKA (the machine learning toolkit I am using) provides this supervised "Resample" filter. When I apply this filter with 'noReplacement'=false and 'bialToUniformClass'=1.0 then this results in a data set, where the the number of instances is nice and almost equal (for class '1'..'3' and the others stay empty).
My question is now: how does WEKA and this filter generate "new"/additional instances for different classes.
Thank you very much in advance for any hints or suggestions.
Cheers Julian