data-mining

Netflix-like competitions

Does anyone know about any competitions or tasks that are similar to the Netflix Prize? It's not only about the money, but also about the dimension of data, the strong link with challenging tasks. ...

Open Source Data Mining Software

Hey everyone! I was wondering; what is the best open source software that I can use for non-binary association rule generations. I need a non-binary implementation because converting my currently non-binary data to binary data would not give the desired results. Thanks and can't wait to here your comments! ...

association mining in WEKA

Hey Everyone, I am trying to generate a set of association rules out of a set of data using WEKA. I have converted my .csv file to an .arff file that is usable by WEKA. Once in the software I remove all string fields form the data set and convert everything to nominal data. My problem is when I go to the association rules and try to ge...

Too complex models in processing of data

For those that process data, there is a saying: "If you torture data sufficiently, it will confess to almost anything". This is mathematically supported by the Boferroni's theorem, which states that "as one performs an increasing number of statistical tests, the likelihood of getting an erroneous significant finding (Type I error) also i...

Data Mining, Statistics, Collective Intelligence and AI Algorithms Books and Programming Resources

In my effort to continuously improve myself, I decided to learn about Data Mining, Statistics, Collective Intelligence and AI Algorithms, and well, that sort of stuff. What are the free ebooks, and web resources ( tutorials, code) etc that I can use on? ...

fetching information from data - data mining practical techniques

hi all, i am developing an online book store using php and mysql. now i want to implement some data mining techniques like recommending related books and so on. i want to know what are the best resources to get some useful practical techniques to implement such things. thx in advance. ...

Collaborative Filtering: Ways to determine implicit scores for products for each user?

Having implemented an algorithm to recommend products with some success, I'm now looking at ways to calculate the initial input data for this algorithm. My objective is to calculate a score for each product that a user has some sort of history with. The data I am currently collecting: User order history Product pageview history for b...

What is the meaning of jitter in visualize tab of weka

In weka I load an arff file. I can view the relationship between attributes using the visualize tab. However I can't understand the meaning of the jitter slider. What is its purpose? ...

Finding the center of a cluster

I have the following problem - made abstract to bring out the key issues. I have 10 points each which is some distance from the other. I want to be able to find the center of the cluster i.e. the point for which the pairwise distance to each other point is minimised, let p(j) ~ p(k) represent the pairwise distance beteen points j a...

What data mining tools do you use?

Hello everyone, Besides the two well-known Open Source tools RapidMiner and Weka, are there any other good tools (either Open Source or Commercial), which you can recommend for data mining? Thanks in advance! ...

Finding K-mean Centroid in SQL Server 2008 clustering algorithm

How do you find centroid values for each cluster using the excel data mining plug-in (or SQL Server 2008)? In particular, how can the centroids be measured accurately when their vectors include nominal and boolean values? I've read the online books and I only got as far as node_distribution. I'm looking for centroid values or an algori...

Best database engine for huge datasets

I do datamining and my work involves loading and unloading +1GB database dump files into MySQL. I am wondering is there any other free database engine that works better than MySQL on huge databases? is PostgreSQL better in terms of performance? I only use basic SQL commands so speed is the only factor for me to choose a database ...

Python and data mining

Iam learning data mining and wondered how Python figures when it comes to data mining? Are there good tools for data mining in python? ...

Data mining/BI/Analytics/ML : Can a mathematically challenged person move into this field?

I have recently become interested in the field(s) of data mining and machine learning. The idea of going through huge datasets and trying to correlate hidden patterns and trends is fascinating. So far I have done the following Used Weka to load simple data sets and generate decision trees Continously read books, wiki's, blogs and SO on...

delicious bookmarks - urls frequently bookmarked

I haven't found any pre-made scripts that would help me analyze my delicious bookmarks. I want to know if there are any websites that I tend to frequently bookmark. I know I can export my bookmarks and can go from there. Has anyone done this? How have you gone about it? On a side note - are there any RSS readers that do something simila...

feasibility on data mining program call stack using AOP

I am reading an article in IEEE Computer magazine about using data mining on applications. The part that is intriguing to me is the idea that we can have software that can monitor the execution flow of an program, and put the data into a database, where we can do some data mining. This data could then be used by a data mining tool to ...

How to Predict if Function Name Follows Convention

Suppose you have a repository of 10,000 function names and possibly their frequency of use in a corpus of code which can be in C/C#/C++. (they have different conventions usually prescribed) Some Samples may be: DoPaint OnPaint CloseWindow DeleteGraphOnClose FreeConnection ConnectInternat (smallTypo, but part of code) FreeSoH Now give...

Difference between analysis services and business intelligence development studio?

Hi guys, As you might have guessed from the title, I'm really new to analysis services. I've spent the last 5 hours (crazy!) just trying to figure out what is the difference between the analysis services avail. through SSMS and business intelligence development studio avail. through visual studio? Thanks ...

Can someone suggest how this Perl script works?

I have to maintain the following Perl script: #!/usr/bin/perl -w die "Usage: $0 <file1> <file2>\n" unless scalar(@ARGV)>1; undef $/; my @f1 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>); my @f2 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>); die "Error: file1 has $#f1 serials, file2 has $#f2\n" if ($#f1 != $#f2); foreach my $g (0 .. $#f1...

Microsoft Business Intelligence. Is what I am trying to do possible?

Hi guys, I have been charged with the task of analysing the log table of my company's website. This table contains a user's click path throughout the website for a given session. My company is looking to understand/spot trends based on the 'click paths' of our users. In doing so, identify groups of users that take on a certain 'click pa...