views:

575

answers:

3

Im trying to build an app to detect images which are advertisements from the webpages. Once I detect those Ill not be allowing those to be displayed on the client side.

From the help that I got here in stackoverflow, I thought SVM is the best approach to my aim.

So, I have coded SVM and an SMO myself. The dataset which I have got from UCI data repository has 3280 instances ( Link to Dataset- http://archive.ics.uci.edu/ml/datasets/Internet+Advertisements )where around 400 of them are from class representing Advertisement images and rest of them representing non-advertisement images.

Right now Im taking the first 2800 input sets and training the SVM. But after looking at the accuracy rate I realised that most of those 2800 input sets are from non-advertisement image class. So Im getting very good accuracy for that class.

So what can I do here? About how many input set shall I give to SVM to train and how many of them for each class?

Thanks. Cheers. ( Basically made a new question because the context was different from my previous question. http://stackoverflow.com/questions/1991113/optimization-of-neural-network-input-data )

+2  A: 

The required size of your training set depends on the sparseness of the feature space. As far as I can see, you are not discussing what image features you have chose to use. Before you can train, you need to to convert each image into a vector of numbers (features) that describe the image, hopefully capturing the aspects that you care about.

Oh, and unless you are reimplementing SVM for sport, I'd recomment just using libsvm,

Vebjorn Ljosa
I did n`t understand what you meant by sparseness of the feature space and how exactly will it decide size of my training set. Let me make myself little clear here. 1. Yes, I`m only doing text analysis for predicting the image as advertisement/non-advertisement image. 2. I`m forced not to use these libraries on the internet and implement SVM on our own. I have already coded most of SVM and can test accuracy of it`s output. Thanks. –
Amol Joshi
Let me try to be more clear. How do you get from an image to the vector of number that you input to your SVM for that image? Surely you don't just give it the red, green, and blue color of each pixel in the image?
Vebjorn Ljosa
I`m doing text analysis to get different attributes of that image. Using that as training set ( which is already there in the UCI repository ) I`m training my svm.Now the problem is advertisement training set count is only 400 compared to the non-advertisement training set count which is about 2800. So now what can I do here? About how many input set shall I give to SVM to train and how many of them for each class?Thanks.
Amol Joshi
+2  A: 

There are two ways of going about this. One would be to balance the training data so it includes an equal number of advertisement and non-advertisement images. This could be done by either oversampling the 400 advertisement images or undersampling the thousands of non-advertisement images. Since training time can increase dramatically with the number of data points used, you should probably first try undersampling the non-advertisement images and create a training set with the 400 ad images and 400 randomly selected non-advertisements.

The other solution would be to use a weighted SVM so that margin errors for the ad images are weighted more heavily than those for non-ads, for the package libSVM this is done with the -wi flag. From your description of the data, you could try weighing the ad images about 7 times more heavily than the non-ads.

dmcer
Could you explain why you would balance the training data? I thought SVMs choose the decision surface with the maximum distance from the training patterns. Why would it matter how many other training patterns are behind that decision surface? And how would oversampling patterns help? (I always thought weighted SVMs were meant to model different costs for misclassification to different classes and/or a priori probabilities - but the OP said nothing about costs or a priori probabilites)
nikie
@nikie - Hard SVMs choose the decision surface with the maximum separation of the training patterns. But, once you allow for margin and classification errors (i.e., when you introduce C), SVMs trade-off maximizing the margin size with allowing some points to either be in the margin or even misclassified. With unbalanced data, a large portion of the data points from the smaller class can end up in the margin or worse. Up weighting them or balancing the data set essentially fixes this.
dmcer
@dmcer: Thanks for the explanation. I tried to down sample non-advertisement to about 600. But the accuracy was very poor. And when I trained using 400 ad and 2400 non-ad. Yes, the training time was very lengthy, but then I got around 95% accuracy in classifying non-advertisements and advertisement classification accuracy dropped to very very low. I guess the effect was due to non-advertisement set dominating on the advertisement set. So will this weighted SVM work in this case?and am I doing something wrong here since non-ad accuracy is very good when i take 2800 training samples?
Amol Joshi
@amol: Weighted SVM is probably a good thing to try next. With it, you'll be simulating upsampling the number of ad images, but without actually increasing the size of the training set. Are you still using your own implementation of SVM? If so, you might want to temporarily switch to something like libSVM or SVM light, at least until you figure out what SVM configuration works well on your data. That way, you will be able to pull apart configuration issues and classifier implementation bugs.
dmcer
Thanks for the suggestion.I`ll try to fix up the configuration with LibSVM. Though eventually I`ll have to implement everything myself.However I can`t try LibSVM right now. I got exam next week and I`ll rejoin work after that.Hopefully I can get your assistance then. Cheers.
Amol Joshi
Now I have figured it out that for c=0.5 I`m getting very good result for non-advertisement samples and getting very good result for advertisement samples when c=2. I guess you try to combine both of them in weighted SVM. But I`m not sure about how to implement it in SMO for the SVM.Am I on the right path here? It`ll be great if you can provide me with some references in how to implement weighted SVM.I have already gone through this paper on weighted SVM : http://www.ieeexplore.ieee.org/iel5/10498/33257/01571749.pdf?arnumber=1571749But need some reference to implement it in SMO.Cheers.
Amol Joshi
LIBSVM, http://www.csie.ntu.edu.tw/~cjlin/libsvm/ , uses SMO for optimization and implements weighted SVM. Since it's open source, you could start by looking at their source code.
dmcer
Is there any documentation references present about SMO for weighted SVM?
Amol Joshi
Is there any particular question you have about SMO for weighted SVM? I was under the impression that implementing it just meant having different bounding C constants for each class when adjusting the data point alpha values.
dmcer
Basically I have developed basic SMO from the john platt`s paper and taking reference to the pseudo-code provided by Stanford university here ( www.stanford.edu/class/cs229/materials/smo.ps ). I understand that with weighted SVM there is different C value for different class.Basically the problem is that I don`t know which C to use where now. In SMO we select two alpha values to be changed (say i and j th), so here in weighted SVM I have doubts over which index of C array to be used in finding L and H values used in SMO. And have doubts about their use in selection and rejection criteria.
Amol Joshi
I tried using same index i.e. 'i' everywhere, but I don`t think it is working correctly that way because sometimes SMO does n`t converge or classifies only one of the class correctly.Is there any documentation reference where I can resolve these issues?
Amol Joshi
Let's make things easy. If you drop the bias term 'b' from the problem formulation, the equality constraint in the dual goes away. You can then change the alpha values individually, rather than having to do pairwise updates. That should make it simple to update and clip each alpha using the correct bound. You can then simulate having a bias term by having a single feature that takes on a value of 1 for each data point.
dmcer
Could you explain how will the method of dropping bias term 'b' work? SMO works because it changes 2 alphas at a time and hence still satisfying summation( alpha(i) * y(i))=0 condition. If we are only changing one alpha the condition will not be satisfied. Also it would be great if you can give some reference about the standard SMO algorithm documentation. I`m not able to follow how they are deriving L and H value conditions in the standard SMO algorithm. Thank you.
Amol Joshi
Dropping b makes the constraint summation( alpha(i) * y(i))=0 disappear. The SMO algorithm is covered in this paper by John Platt http://research.microsoft.com/pubs/68391/smo-book.pdf .
dmcer
Thanks for the reply again. I`m trying to implement the SVM without the bias term. For that I have to add another column in the input matrix which is set to 1 for all samples right?Also I was successful in understanding how to define L and H values for weighted SVM ( with b ). But somehow the alpha values are restricted only by one non-ad C value i.e. I`m getting all values of alpha array < C ( of non-ad ), hence the accuracy is not so great. Is there anything which I can check to whether my code is right or wrong?
Amol Joshi
That's right. For this, you'll need to add another feature that is set to 1 for all datapoints. It sounds like there's a bug in the SMO code you have right now, since only the alphas for non-ads should be restricted by the non-ad C. But, I would first work on getting things running with the code without the bias term b.
dmcer
I have got the code without bias term b running now. Getting good accuracy at around 94-95% for both ad as well as non-ad. Are there any tricks or solutions to improve the accuracy further?And I`m still stuck with modifying SMO for weighted SVM.Please reply. Thanks.
Amol Joshi
If you're scoring the classes separately, ~95% correct for each class doesn't sound so bad. But, I imagine you want to beat the Kushmerick '99 numbers (so + 97%)? To improve accuracy further, you could experiment with both different kernels and kernel hyperparameters. Which kernel are you using now?
dmcer
Also, about the modified SMO for weighted SVM, I suspect that your second argument to the H=min() operation is incorrect when the classes are not equal and that the second argument to the L=max() operation is incorrect when the classes are equal. The terms for both of these arguments should make sure that when you make a change to alpha2 it doesn't force alpha1 into having to exceed C.
dmcer
Yes I want to beat Kushmerick '99 numbers. Right now getting 95% accuracy for both the classes. I`m using Gaussian Kernel right now. Basically finding the sigma term for the guassian kernel by following algo. 1. Take 1000 pairs data points from dataset. 2. Finding the distance between each of these pairs. 3. and the mean of those distance would give me sigma value.Is there any improvement possible here?
Amol Joshi
If you have the time, just do a grid search to find better values.
dmcer
I have few doubts again with SMO. After we select j from heuristic by using those L and H values we get the value of alpha[j] using which we get alpha[i]. So I could n`t understand what was incorrect in my SMO for weighted SVM.
Amol Joshi
Few more question I have with the occurrence of the training samples. I have all of the advertisement samples in first 500 samples of the dataset. If I try to put these non-advertisement samples in between the advertisement samples (e.g. something like putting 5 non-advertisement samples after each advertisement sample), will it help? Also how will cross validation helps in avoiding over fitting in SVM?
Amol Joshi
@data set order: SVMs don't care about the order of the data set. Putting the non-advertisements in between the advertisements shouldn't change anything.
dmcer
@cross validation: Using cross validation, you can search for hyperparameter values that result in good performance on data that was not used to train the classifier. However, cross validation is still not an unbiased estimate of how well your classifier will do on real unseen data. For that, you'll need a separate held out test set that you only evaluate your classifier on after you've found good hyperparameters.
dmcer
Is there any other technique available to get the desired results of weighted SVM other than making changes to the SMO?I have tried a lot but still not been able to get desired effects from my modified SMO.
Amol Joshi
How well do you do when you use a package like libSVM instead of the custom SVM implementation?
dmcer
Though I have not yet been successful in using LibSVM,but I made few minor changes to my SMO and I`m getting better accuracy now. Here is my result.Using Gaussian Kernel and SVM without bias term:Training data: Number of Correct AD Outputs = 428, Number of wrong AD Outputs = 31, Number of Correct Non-AD Outputs = 2337, Number of wrong Non-AD Outputs = 4. Testing Data:Number of Correct Non-AD Outputs = 471, Number of wrong Non-AD Outputs = 8. Now do I really need to implement weighted SVM here?Right now I`m trying to grid search through the C values and will try different kernels soon.
Amol Joshi
To check whether my implementation is good I have also checked my implementation on Handwritten digit recognition data set ( restricting it to only 0, 1 digits) from UCI repository. I got very good result with 99.90% accuracy. Does n`t this result show that my custom implementation is good enough or at least nothing is wrong in it?
Amol Joshi
@"Here is my result..." If I understand your comment correctly, you're now doing well on the training set for both ads and non-ads? It also looks like you're doing well on the test set for non-ads. Are you doing well on the ads there as well?
dmcer
As I mentioned in my earlier posts. I have only got 459 ad samples, so I`m using them in my training set. In testing set I got only non-ad samples.
Amol Joshi
You'll need some ad images in the dataset you use to evaluate your classifier. You also shouldn't run the classifier on the final evaluation set until after you've used either a dev set or cross validation to find a good classifier configuration and hyperparameter values.
dmcer
A: 

Thanks for the reply. I want to check whether I`m deriving the C values for ad and non-ad class correctly or not. Please give me feedback on this.

Link to http://s572.photobucket.com/albums/ss165/amoljoshi28/?action=view&amp;current=weightedSVM.jpg

Or you u can see the doc version here. http://amolkimi.0fees.net/weightedSVM.doc

You can see graph of y1 eqaul to y2 here http://i572.photobucket.com/albums/ss165/amoljoshi28/y1y2.jpg

and y1 not equal to y2 here http://i572.photobucket.com/albums/ss165/amoljoshi28/y1y2-1.jpg

Amol Joshi