data-mining

What method do you use for selecting the optimum number of clusters in k-means and EM?

Many algorithms for clustering are available. A popular algorithm is the K-means where, based on a given number of clusters, the algorithm iterates to find best clusters for the objects. What method do you use to determine the number of clusters in the data in k-means clustering? Does any package available in R contain the V-fold cros...

Event log mining with C#

Hi, I'm looking for a way to data mine the event logs of a remote computer in C#. The problem I have is that I'm working with Amazon web services and in production we use the auto-scaler to bring up/shut up live virtual machine instances as necessary. However, the web services we have running on these instances all log to its local eve...

A good web data extraction/screen scraper program?

I need to capture product data from a site on a regular basis and wondered if any one knows of a good software program? I've trialed Mozenda but its a monthly subscription and pricey in the long term. Obviously something thats free would be best but I don't mind paying either. Just need a decent program thats reliable and doesn't require...

Data mining for integers with exact fitting

I make lot of dealing with RFID cards. As much as there are different readers there are different outputs and coding of same type of cards. I got frequent request to figure out (if possible) to translate one output to another and that means that I have to stare at these numbers and figure out what transformations are. Most common transf...

Need all the Restaurant names, addresses, etc which API will be the best to get it.

I am building a website that will allow you to find restaurants upto a certain distance from your house/or office. ineed to collect a database of all the restaurants. The criteria is based on the below details 1: Maximum distance you can walk/drive from a location 2: cusines of your choice. i need Restaurants Name, Phone number, addres...

Which data mining algorithm would you suggest for this particular scenario?

This is not a directly programming related question, but it's about selecting the right data mining algorithm. I want to infer the age of people from their first names, from the region they live, and if they have an internet product or not. The idea behind it is that: there are names that are old-fashioned or popular in a particular ...

Auto-detecting product data feeds for an arbitrary E-Commerce site?

Hey all! My web app needs to access an arbitrary E-Commerce store and determine whether or not it has a product data feed (i.e. a Google Base feed; an RSS/ATOM feed of all products in the store). Also, I need to extract the location of this feed. The best solution I can think of so far is to maintain a comprehensive list of known loca...

Medical Machine Learning Data Set

I'm researching Medical Data set which includes variable concerning illnesses and treatment type. For example illnesses is colon cancer, it's decision variables (x,y,z,t) and treatment type is chemothreapy, radiothreaphy etc etc. I want to reach such a data set for my KDD and exploratory lesson. Because I want to make useful p...

Optimizing SMO with RBFKernel (C and gamma)

There are two parameters while using RBF kernels with Support Vector Machines: C and γ. It is not known beforehand which C and γ are the best for one problem; consequently some kind of model selection (parameter search) must be done. The goal is to identify good (C;γ) so that the classier can accurately predict unknown data (i.e., testin...

Where to find free tutorials about trading algorithms

Hi I would like to find some tutorial about the trading algorithms like Iceberg, Dagger, Guerrilla etc. I have just found some non-free or marketing sites on this topic. ...

maintaining query-oriented applications

I am currently doing some kind of reporting system.the figures, tables, graphs are all based on the result of queries. somehow i find that complex queries are not easy to maintain, especially when there are a lot of filtering. this makes the query very long and not easy to understand. And also, sometimes, queries with similar filters are...

Retrieve Information From Different Unstructured Text Files - Text Mining?

Hello, I need some help in solving this problem. We have a large amount of documents of a given specified domain. These documents are from differente sources and therefore their structure can be very different too. On the other side I have a table with some specified fields where some figures has to be filled from the extract of the do...

Detecting similar words among n text documents

Hi; I have n documents and want to find common words that are included in these documents. For example I want to say (n-3) documents include the word "web". Certainly I can do this by basic data structures but there maybe efficient algorithm or a way to handle same words with different suffix. Is there any algorithm for such purposes?...

Clever way of building a tag cloud? - Python

Hi folks, I've built a content aggregator and would like to add a tag cloud representing the current trends. Unfortunately this is quite complex, as I have to look for keywords that represent the context of each article. For example words such as I, was, the, amazing, nice have no relation to context. Help would be much appreciated...

How to apply Data Mining (Association Rule) to a huge database ?

Hello What I want to do is to apply Association method of data mining on my SQL Server 2000 database. Association rule is something like "finding the most frequent items that appear together in database." For those who don't know or who want to remember what is association method is like, take a look at this presentation about Associa...

browse data in Android SQLite Database

Is there a way for an Android user to browse the SQLite databases on his/her phone and view the data in the databases? I use the SoftTrace beta program a lot. It's great but has no way that I can find to download the data it tracks to a PC. Thanks ...

Datamining on a mysql database

Hello, I Begin with textmining. I have two database tables with thousands of data.. a table for "skills" and a table for "skills categories" every "skill" belongs to a skills categorie. a "skill" is , physicaly, a varchar(200) field in the database, where there is some text describing the skill. Here are some skills extracted from ...

data mining open source software in java

Hi i just like to know is there any open source data mining software written in java that is approximately less than 3k lines of codes? If yes, please give download link i need to do software testing thank you. ...

Best DataMining Database

I am an occasional Python programer who only have worked so far with MYSQL or SQLITE databases. I am the computer person for everything in a small company and I have been started a new project where I think it is about time to try new databases. Sales department makes a CSV dump every week and I need to make a small scripting applicati...

Resources related to data-mining and gaming on social networks

Hi all I'm interested in the problem of patterning mining among players of social networking games. For example detecting cheaters of a game, given a company's user database. So far I have been following the usual recipe for a data mining project: construct a data warehouse that aggregates significant information select a classifier...