Netflix-like competitions
Does anyone know about any competitions or tasks that are similar to the Netflix Prize? It's not only about the money, but also about the dimension of data, the strong link with challenging tasks. ...
Does anyone know about any competitions or tasks that are similar to the Netflix Prize? It's not only about the money, but also about the dimension of data, the strong link with challenging tasks. ...
Hey everyone! I was wondering; what is the best open source software that I can use for non-binary association rule generations. I need a non-binary implementation because converting my currently non-binary data to binary data would not give the desired results. Thanks and can't wait to here your comments! ...
Hey Everyone, I am trying to generate a set of association rules out of a set of data using WEKA. I have converted my .csv file to an .arff file that is usable by WEKA. Once in the software I remove all string fields form the data set and convert everything to nominal data. My problem is when I go to the association rules and try to ge...
For those that process data, there is a saying: "If you torture data sufficiently, it will confess to almost anything". This is mathematically supported by the Boferroni's theorem, which states that "as one performs an increasing number of statistical tests, the likelihood of getting an erroneous significant finding (Type I error) also i...
In my effort to continuously improve myself, I decided to learn about Data Mining, Statistics, Collective Intelligence and AI Algorithms, and well, that sort of stuff. What are the free ebooks, and web resources ( tutorials, code) etc that I can use on? ...
hi all, i am developing an online book store using php and mysql. now i want to implement some data mining techniques like recommending related books and so on. i want to know what are the best resources to get some useful practical techniques to implement such things. thx in advance. ...
Having implemented an algorithm to recommend products with some success, I'm now looking at ways to calculate the initial input data for this algorithm. My objective is to calculate a score for each product that a user has some sort of history with. The data I am currently collecting: User order history Product pageview history for b...
In weka I load an arff file. I can view the relationship between attributes using the visualize tab. However I can't understand the meaning of the jitter slider. What is its purpose? ...
I have the following problem - made abstract to bring out the key issues. I have 10 points each which is some distance from the other. I want to be able to find the center of the cluster i.e. the point for which the pairwise distance to each other point is minimised, let p(j) ~ p(k) represent the pairwise distance beteen points j a...
Hello everyone, Besides the two well-known Open Source tools RapidMiner and Weka, are there any other good tools (either Open Source or Commercial), which you can recommend for data mining? Thanks in advance! ...
How do you find centroid values for each cluster using the excel data mining plug-in (or SQL Server 2008)? In particular, how can the centroids be measured accurately when their vectors include nominal and boolean values? I've read the online books and I only got as far as node_distribution. I'm looking for centroid values or an algori...
I do datamining and my work involves loading and unloading +1GB database dump files into MySQL. I am wondering is there any other free database engine that works better than MySQL on huge databases? is PostgreSQL better in terms of performance? I only use basic SQL commands so speed is the only factor for me to choose a database ...
Iam learning data mining and wondered how Python figures when it comes to data mining? Are there good tools for data mining in python? ...
I have recently become interested in the field(s) of data mining and machine learning. The idea of going through huge datasets and trying to correlate hidden patterns and trends is fascinating. So far I have done the following Used Weka to load simple data sets and generate decision trees Continously read books, wiki's, blogs and SO on...
I haven't found any pre-made scripts that would help me analyze my delicious bookmarks. I want to know if there are any websites that I tend to frequently bookmark. I know I can export my bookmarks and can go from there. Has anyone done this? How have you gone about it? On a side note - are there any RSS readers that do something simila...
I am reading an article in IEEE Computer magazine about using data mining on applications. The part that is intriguing to me is the idea that we can have software that can monitor the execution flow of an program, and put the data into a database, where we can do some data mining. This data could then be used by a data mining tool to ...
Suppose you have a repository of 10,000 function names and possibly their frequency of use in a corpus of code which can be in C/C#/C++. (they have different conventions usually prescribed) Some Samples may be: DoPaint OnPaint CloseWindow DeleteGraphOnClose FreeConnection ConnectInternat (smallTypo, but part of code) FreeSoH Now give...
Hi guys, As you might have guessed from the title, I'm really new to analysis services. I've spent the last 5 hours (crazy!) just trying to figure out what is the difference between the analysis services avail. through SSMS and business intelligence development studio avail. through visual studio? Thanks ...
I have to maintain the following Perl script: #!/usr/bin/perl -w die "Usage: $0 <file1> <file2>\n" unless scalar(@ARGV)>1; undef $/; my @f1 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>); my @f2 = split(/(?=(?:SERIAL NUMBER:\s+\d+))/, <>); die "Error: file1 has $#f1 serials, file2 has $#f2\n" if ($#f1 != $#f2); foreach my $g (0 .. $#f1...
Hi guys, I have been charged with the task of analysing the log table of my company's website. This table contains a user's click path throughout the website for a given session. My company is looking to understand/spot trends based on the 'click paths' of our users. In doing so, identify groups of users that take on a certain 'click pa...