Upon some research I found two functions in MATLAB to do the task:
cvpartition function in the Statistics Toolbox
crossvalind function in the Bioinformatics Toolbox
Now I've used the cvpartition to create n-fold cross validation subsets before, along with the Dataset/Nominal classes from the Statistics toolbox. So I'm just wondering ...
Hi
I am looking for project ideas in the field of data mining. I expect to complete it in a quarter and intend to use C++, Linux as the environment.
The course I'm taking aims to build the basics of data mining and covers topics like Classification, Regression-Modeling, Clustering and Association learning.
Please point me to some good...
I'm a developer, not too good at math, but I'm willing to learn fun stuff to do with data mining techniques.
I've looked at pragmatic books on the subject which gives me some ideas (and maybe introduces me rapidly some tools). I insist on the fact that I'm not a mathematician!
What do you advise?
...
Hi,
I am looking to work on a machine learning project for my course and I would like to use the netflix prize dataset? But it looks like the contest is closed and the dataset is not available for download in the netflix website. Does anyone who wokred on it has the dataset? If so ,can u share it?
...
Lets say I have 100000 email bodies and 2000 of them contains an abitrary common string like "the quick brown fox jumps over the lazy dog" or "lorem ipsum dolor sit amet". What techniques could/should I use to "mine" these phrases? I'm not interested in mining single words or short phrases. Also I need to filter out phrases that I alread...
Currently it seems common practice to parse Postfix log files in order to determine if a message has been sent. Is there an API for Postfix or a look up table in it that yields this information in a manner quicker than parsing (rather lengthy) log files?
...
I'm looking for some sort of tool that can take an html document and pump out a selector based representation of the file.
For example:
<div>
Some text
<ul class="foo">
<li>First</li>
<li>Second</li>
<ul>
</div>
And output a flat text file in the spirit of:
div
div #text Some text
div ul.foo li Frist
div ul.foo li Se...
I have a database full of reviews of various products. My task is to perform various calculation and "create" another "database/xml-export" with aggregated data. I am thinking of writing command line programs in python to do that. But I know someone have done this before and I know that there is some open source python solution or simila...
How can I extract information from opensocial based networks like orkut.
...
I would like to know if there is any news feeds/api that can be used for coding/datamining.
Skygrid for example gives live news feeds and if the news is good or bad, but it's all in flash and they don't seems to provide any rss other than their twitter.
...
I have a datasets with information like age, city, age of children, ... and a result (confirm, accept).
To help modelisation of "workflow", I want to create automatically a decision tree based on previous datasets.
I have take a look at http://en.wikipedia.org/wiki/Decision_tree_learning and I know that the problem is clearly not obvio...
I'm writting a setup program that needs to install the DataMining Adding for Office 2007.
1) How do I detect if it's already installed?
2) If it is not installed, I download and run the MSI (SQLServer2008_DMAddin.msi). But how can I run the Server Configuration (Microsoft.SqlServer.DataMining.Office.ServerConfiguration.exe) tool myself...
Here's the problem. I have a bunch of large text files with paragraphs and paragraphs of written matter. Each para contains references to a few people (names), and documents a few topics (places, objects).
How do I data mine this pile to assemble some categorised library? ... in general, 2 things.
I don't know what I'm looking for, so...
How do I data mine a pile of text to get keywords by usage? ("Jacob Smith" or "fence")
And is there a software to do this already? even semi-automatically, and if it can filter out simple words like "the", "and", "or", then I could get to the topics quicker.
...
I would like to make a piece of software able to regognize whether a sentence is positive or negative.
Is there any Lexical Analysis libraries arround?
I don't really know where I should start.
...
get the x most similar texts from a lot of texts to one text.
maybe change the page to text is better.
You should not compare the text to every text, because its too slow.
...
We are planning to develop a datamining package for windows. The program core / calculation engine will be developed in F# with GUI stuff / DB bindings etc done in C# and F#.
However, we have not yet decided on the model implementations. Since we need high performance, we probably can't use managed code here (any objections here?). The ...
For use to analyze documents on the Internet!
...
I'm working on a web application which will be used for classifying photos of automobiles. The users will be presented with photos of various vehicles, and will be asked to answer a series of questions about what they see. The results will be recorded to a database, averaged, and displayed.
I'm looking for algorithms to help me identify...
So i´m starting to write my thesis of my master, next semester (should be done before june), i already have the theme, and i need to write the state of art till february.
The main areas are Intelligent systems, Natural Language processing, Semantic Analysis and Data Mining.
I am researching for the best books about Natural Language pro...