data-mining

mining large dataset using RapidMiner

Hi, I have tried to use RapidMiner to mine the dataset which contained 72.000 records with 80 attributes. First I tried to add the block 'Optimize Selection (Evolutionary)' to select just relevant attributes. For regression I tried to use SVM but about an hour after running the analysis using SVM I've got a notification error. Has any...

non density based Data clustering algorithm

Hi, I'm working on a cluster analysis program that takes a set of points S as an input and labels each point with that index of the cluster it belong to. I've implemented the DBScan and OPTICS algorithms and they both work as expected. However, the results of those algorithms can be very different depending on the initial values of MinP...

Decision tree induction open-source code

I am preparing a task for computer vision class, which involves training a simple classifier after extracting features from images. Since machine learning is not the main topic here, I don't want students to implement a learning algirithm from scratch. So, I have to recommend them some reference implementations. I believe the decision tr...

How to determine the most informative feature in a tree learned by Weka

Hi there. I used the weka to train a J48 classifier,and it returned a textual representation of tree. Now if I want to determine which feature is the most informative,how could I proceed?Any idea is welcomed. Thanks in advance. ...

Using r and weka. How can I use meta-algorithms along with nfold evaluation method?

Here is an example of my problem library(RWeka) iris <- read.arff("iris.arff") Perform nfolds to obtain the proper accuracy of the classifier. m<-J48(class~., data=iris) e<-evaluate_Weka_classifier(m,numFolds = 5) summary(e) The results provided here are obtained by building the model with a part of the dataset and testing it with ...

Please help me on choosing right classifer

Hi all, I am facing a problem on selecting correct classifier for my data-mining task. I am labeling webpages using statistical method and label them using a 1-4 scale,1 being the poorest while 4 being the best. Previously,I used SVM to train the system since I was using a binary(1,0) label then.But now since I switch to this 4-class ...

Online data mining without client side oauth

I have a little app that mines data on social networks and returns interesting results (e.g. the latest conversations around a certain topic). However, the front end requires that the users connects with the various services first via oauth, before these services' APIs can be scanned. I would like this process to be automated on the ser...

Final year project ideas(Data mining - Security)

Hey all, I'm in my CS final year and I have like 8 months and a group of 4 to accomplish the project. About the idea, I searched a lot, but nothing was really interesting. I don't want to work on (because I've already did): Simulations for physical issues. 3D games. Learning Systems. I was searching in the following topics: Data ...

Could I have some suggestions on data-mining tasks please

Hello there. Now I need to create a data-mining task of my own.I already talked to some people,the most popular ideas would be price prediction or sport result prediction,which I think there are already plenty of people implementing them. So could anyone give me some real-life ideas please that you found data-mining may be of use,like p...

How do Search Engines find relevant content ?

How does Google find relevant content when its parsing the web? Lets say for instance, Google uses the PHP native DOM Library to parse content, What methods would they be for it to find the most relevant content on a web page. My thoughts would be that it would search for all paragraphs, order by the length of each paragraph and then f...

The termination criteria when building decision tree

Hi there, I am writing my own code for a decision tree. I need to decide on when to terminate the tree building process. I could think of limiting the height of the tree, but this seems trivial. Could anyone give me a better idea on how to implement my termination function. Here in my tree building algorithm. Thanks a lot. ...

The effect of Decision Tree Pruning

Hi all,I want to know if I build up a decision tree A like ID3 from training and validation set,but A is unpruned. At the same time,I have another decision tree B also in ID3 generated from the same training and validation set,but B is pruned. Now I test both A and B on a future unlabeled test set,is it always the case that pruned tree w...

How exactly does sharkscope or PTR data mine all those hands?

I'm very curious to know how this process works. These sites (http://www.sharkscope.com and http://www.pokertableratings.com) data mine thousands of hands per day from secure poker networks, such as PokerStars and Full Tilt. Do they have a farm of servers running applications that open hundreds of tables (windows) and then somehow spide...

Idea for Implementing new or modifying existing algo.

Hello Everyone, I am doing a class project. I want to implement new algo or modify existing ones (like dimension reduction, clustering, bagging, boosting, SVM, FPtree, text mining, etc). Please give me some ideas for project. Thanks ...

Fuzzy queries to database

I'm curious about how works feature on many social sites today. For example, you enter list of movies you like and system suggests other movies you may like (based on movies that like other people who likes the same movies that you). I think doing it straight-sql way (join list of my movies with movies-users join with user-movies grou...