data-mining

categorizing friends in social networks

I'm facing tho following problem: let's say u is a social network user and as such has a list of friends, F(u). a partition is a function F->G, where G is a set of groups such as High-school, university, work, etc'. I need to come up with algorithm to partite F: the input is F and also F(f) for every f in F (the list of friends for eac...

What is Knowledge Discovery and Data Mining?

I suppose SQL queries fetch "raw data"... Is there any good point to start regarding data mining in SQL server? Are there any available KDD ready-to-go, algorithms in MS-SQL server 2005, 2008? ...

Can someone give an example of cosine similarity, in very simple, graphical way?

http://en.wikipedia.org/wiki/Cosine%5Fsimilarity Can you show the vectors here (in a list or something) And then do the math, and let us see how it works? I'm a beginner. ...

How would I go about plotting "live" stock market data with Processing, jQuery, Pure Data or Max/MSP?

This is intended as a question quite open to any suggestions, hints or pointers. I wish to start playing around with home brewed automated investment models, the beginnings of which I have concepts for. I'm familiar with a few frameworks/languages that I suspect might be able to help me in this. Suggestions regarding other languages than...

Difference between input attribute and predictable attribute

Could anyone please clarify the difference between input attribute and predictable attribute for decision tree algorithm in Data mining. Thanks. ...

How do you detect outliers on multivariate data?

I am trying to do a regression problem but I have 3 independent variables and not 1 so it is hard to detect outliers from a scatter graph. Any suggestions? ...

Grouping to extract common values in semi-structured data

I've got a 'somewhat' ugly field in a database which holds the names of locations. For instance, Madison Square Gardens which has also been entered as "The Madison Square Gardens", etc. etc. I'm trying to extract the data so that I can get an accurate list of all the locations. In order to accomplish this, what I've done is created a ...

Question About Using Weka, the machine learning tool

I'm using the explorer feature of Weka for classification. So I have my .arff file, with 2 features of NUMERIC value, and my class is a binary 0 or 1 (eg {0,1}). Sample: @RELATION summary @ATTRIBUTE feature1 NUMERIC @ATTRIBUTE feature2 NUMERIC @ATTRIBUTE class {1,0} @DATA 23,11,0 20,100,1 2,36,0 98,8,1 ..... I load this .arff file,...

Information mining, classification, modification

Any examples, tips, guidance for the following scenario? I have retrieved updates from several different news websites. I then analyse that information to predict on current trend in the world. I could only find the information on data mining when searching for above idea, but it is for database systems. While data mining is similar to...

Machine learning challenge: diagnosing program in java/groovy (datamining, machine learning)

Hi All! I'm planning to develop program in Java which will provide diagnosis. The data set is divided into two parts one for training and the other for testing. My program should learn to classify from the training data (BTW which contain answer for 30 questions each in new column, each record in new line the last column will be diagnos...

Datamining library for .NET

Hi, Does anybody know about any dataming libraries for .net? ...

Find HEX patterns and number of occurrences

Hi, I'd like to find patterns and sort them by number of occurrences on an HEX file I have. I am not looking for some specific pattern, just to make some statistics of the occurrences happening there and sort them. DB0DDAEEDAF7DAF5DB1FDB1DDB20DB1BDAFCDAFBDB1FDB18DB23DB06DB21DB15DB25DB1DDB2EDB36DB43DB59DB32DB28DB2ADB46DB6FDB32DB44DB40D...

How to find common phrases in a large body of text

Hi, I'm working on a project at the moment where I need to pick out the most common phrases in a huge body of text. For example say we have three sentences like the following: The dog jumped over the woman. The dog jumped into the car. The dog jumped up the stairs. From the above example I would want to extract "the dog jumped" as i...

ad hoc query tool patterns

Hi, all. I'm looking for common patterns of implementing ad-hoc querying capabilites graphically. I've looked at SQL query builders in Access and TOAD, but I'm interested if anyone is aware of products that have build such a tool against a domain specific data warehouse (e.g. clinical databases). Thanks, ...

Analyzing noisy data

I recently launched a rocket with a barometric altimeter that is accurate to roughly 10 ft (calculated via data acquired during flight). The recorded data is in time increments of 0.05 sec per sample and a graph of altitude vs. time looks pretty much like it should when zoomed out over the entire flight. The problem is when I try to ca...

What does dimensionality reduction mean?

What does dimensionality reduction mean exactly? I searched for its meaning, I just found that it means the transformation of raw data into a more useful form. So what is the benefit of having data in useful form, I mean how can I use it in a practical life (application)? ...

Where can I get sample .dbf file?

I need some large dbf files. It is not very important what is inside. On classes I have assignment to use SAS Enterprise Miner to explore and mine some data, just to give axample. I cannot ude sample data from sas, I need another one with large amount of records (e.g. 5k-10k) Where can I find those? Because google is useless in this cas...

Minimum confidence and minimum support for Apriori

What are appropriate values for minimum confidence and minimum support values for the Apriori algorithm? How could you tweak them? Are they fixed values, or do they change during the running of the algorithm? If you have used this algorithm before, what values did you use? ...

Data-mining related forums

Which forums you are using for data mining questions? SO is mainly intended for programming, not for DM questions. ...

java framework for image pattern recognition?

I'm looking for a Java framework to help with some data mining specific to images. We have a set of historical images that I would like to categorize and classify. I'm was hoping to find something like weka http://www.cs.waikato.ac.nz/ml/weka/ or Marsyas http://marsyas.sness.net but more specific to sifting through image data to find p...