views:

333

answers:

4

Im trying to build an app to detect images which are advertisements from the webpages. Once I detect those Ill not be allowing those to be displayed on the client side.

Basically I`m using Back-propagation algorithm to train the neural network using the dataset given here. http://archive.ics.uci.edu/ml/datasets/Internet+Advertisements

But in that dataset no. of attributes are very high. In fact one of mentors of the project told me that If u train the Neural Network with that much attributes, it`ll take lots of time to get trained. So is there a way to optimize the input dataset? Or I just have to use that many attributes?

Cheers.

+4  A: 

1558 is actually a modest number of features/attributes. The # of instances(3279) is also small. The problem is not on the dataset side, but on the training algorithm side.

ANN is slow in training, I'd suggest you to use a logistic regression or svm. Both of them are very fast to train. Especially, svm has a lot of fast algorithms.

In this dataset, you are actually analyzing text, but not image. I think a linear family classifier, i.e. logistic regression or svm, is better for your job.

If you are using for production and you cannot use open source code. Logistic regression is very easy to implement compared to a good ANN and SVM.

If you decide to use logistic regression or SVM, I can future recommend some articles or source code for you to refer.

Yin Zhu
Sir, My project group wanted to use Neural Network for this?Do I have any options with Neural Network? Can I get some assistance from somewhere about this?Can I use logistic regression like algorithms with Neural Network? And more importantly are there any of that kind?
Amol Joshi
Neural Networks are not great a highly dimensional problem spaces. As for making it go faster try using a GPU or reducing the number of features or examples. In the end an ann is the wrong tool for the job.
Steve
Okay now I know that ANN won`t be the right tool,so I have decided to use SVM now. It `ll be great if you could recommend some articles about it`s use in my project. Also I wanted to ask u that whether I should first implement. PCA before implementing SVM.Thanks. Cheers!
Amol Joshi
doing dimensionality reduction(PCA here) before SVM usually does not improve any accuracy! Because SVM is able to do feature selection. The other reason is that SVM is fast enough. First you want to have a look at libsvm package, it is a well designed, well written, well tested and production quality SVM package. Its copyright is here: http://www.csie.ntu.edu.tw/~cjlin/libsvm/COPYRIGHT. It uses SMO optimization algorithm, which is not hard to implement if you need to implement this algorithm yourself. Refer to SVM's wikipedia page for details. please leave comments if you need more help :)
Yin Zhu
Ok. I`ll have to write the code by myself. Basically I`m facing following issues while coding.1. The first few attribute values in the dataset are height, width and the aspect ration which are floating values.So can i mix use them along with the other attribute values which are 0/1?2. While implementing SMO optimization algorithm can I get away with it if I implement simplified SMO given by the stanford ( http://www.stanford.edu/class/cs229/materials/smo.ps ). They are saying that the simplified one works for their problem.Do I have to implement the full SMO given in the John Patt paper?
Amol Joshi
3. Also I have decided to use Gaussian Kernel basically because it`s the most popular one. In that kernel how can i decide the sigma parameter?I know I`m asking few basic questions, but I have read lots of stuff on SVM but could n`t find answers to these. Thanks Yin Zhu and others who have replied here before.Cheers.
Amol Joshi
1. you need to do some scale over the attributes values, for those big ones, scale them to a small value. e.g. you can divides attribute i by its maximum, thus scale the attribute into [0..1]( if all of them are positive). 2. you can implement this one as a start, and test it on your dataset against packages like svmlight and libsvm to see whether you need to improve it. 3. You need to set the sigma parameter after trying, it is not determined by your algorihtm. NOTICE: once you have the svm implemented, it can use any kernerls, most of which are quite straightforward to implement.
Yin Zhu
Thanks for the reply, Yin Zhu.I have few other questions. We thought of dividing the SVM in two, one which will take the continuous values and other SVM will be trained with all the binary attributes. Since I`m building my own SVM code, I wanted to ask you that whether I can use the same SVM code for both of these SVMs. Whether Binary data SVM can work for SVM with continuous values of attributes.
Amol Joshi
Another question that I have is about the missing data. Can I get some pointers to few good links where problems with the missing data attributes are solved?Also do I need to care about the missing data values in my SVM project?
Amol Joshi
Yin Zhu
I have developend my SVM with the simplified SMO given in the stanford paper.But basically alpha values are coming the range of e +305. Am I doing something horribly wrong here?Are there any thumb rules for select these constants used in SMO like C, tolerance, eps?Cheers.
Amol Joshi
Oh. I realised i made few errors while coding SMO. I corrected them and now I`m getting some output. But now got another issue which I`have posted here. http://stackoverflow.com/questions/2284059/svm-classification-minimum-number-of-input-sets-for-each-classPlease help.Thanks and Cheers.
Amol Joshi
A: 

Thanks Yin Zhu for suggestion. But we want a solution in ANN. So can you suggest some good options in ANN, that would be helpful to us.

Dharmendra Joping
A: 

aplly a seperate ANN for each category of features for example 457 inputs 1 output for url terms ( ANN1 ) 495 inputs 1 output for origurl ( ANN2 ) ...

then train all of them use another main ANN to join results

bluekid
+1  A: 

If you're actually using a backpropagation network with 1558 input nodes and only 3279 samples, then the training time is the least of your problems: Even if you have a very small network with only one hidden layer containing 10 neurons, you have 1558*10 weights between the input layer and the hidden layer. How can you expect to get a good estimate for 15580 degrees of freedom from only 3279 samples? (And that simple calculation doesn't even take the "curse of dimensionality" into account)

You have to analyze your data to find out how to optimize it. Try to understand your input data: Which (tuples of) features are (jointly) statistically significant? (use standard statistical methods for this) Are some features redundant? (Principal component analysis is a good stating point for this.) Don't expect the artificial neural network to do that work for you.

Also: remeber Duda&Hart's famous "no-free-lunch-theorem": No classification algorithm works for every problem. And for any classification algorithm X, there is a problem where flipping a coin leads to better results than X. If you take this into account, deciding what algorithm to use before analyzing your data might not be a smart idea. You might well have picked the algorithm that actually performs worse than blind guessing on your specific problem! (By the way: Duda&Hart&Storks's book about pattern classification is a great starting point to learn about this, if you haven't read it yet.)

nikie