views:

1221

answers:

14

Hi

I'm due to take up a project which is into data mining. Before I jump in I wanted to probe around for different data mining tools (preferably open source) which allows web based reporting. In my scenario the all the data would be provided to me, so I'm not supposed to crawl for it.

In n nutshell, am looking for a tool which does - Data Analysis, Web based Reporting, provides some kind of a dashboard and mining features.

I have worked on the Microsoft Analysis Services and BOXI and off late I have been looking at Pentaho, which seems to be a good option.

Please share your experiences on any such tool which you know of.

cheers

+5  A: 

I believe WEKA is the best open source DM software out there.

Check it: http://www.cs.waikato.ac.nz/ml/weka/

Alix Axel
A: 

I am a python-er myself and I have to say:

Yes! All of that can be done in Python.

I last played around with Beautiful Soup[0]. It's a really simple to use module that lets you grab/mine data from html and xml (excellent for 'screen scraping').

If you dont know python, .... well It's really easy to learn.

[0]http://www.crummy.com/software/BeautifulSoup/

ajray
Data mining is all about discovering "hidden" knowledge in data, it has nothing to do (at least directly) with screen scrapping, but thanks for pointing me at Beautiful Soup, I'll play around with it. =)
Alix Axel
+3  A: 

Weka is great, but you might want to try the Orange Data Mining toolkit instead.

http://www.ailab.si/orange/

Edit: And as of November 2010, I must say I really like KNIME.

ybakos
What do you mean by "November 2011"?
mt3
I must have been time traveling! (Edited to 2010, my original intent.) Thanks for pointing out my mistake.
ybakos
+2  A: 

R has a lot of excellent packages related to data mining. In particular, look at:

It also ties into Weka (see the RWeka package). And it can be integrated with either .Net (through COM) or Python (through RPy or RPy2).

I would agree regarding Pentaho for a reporting platform, although it's a very large project depending upon what you're using it for.

Shane
+1  A: 

Pentaho is a very professional solution. Definitely a very good choice.

Pascal Thivent
+1  A: 

You can look at Data Mining SDK and its blog.

sashaeve
A: 

A list of some open source data mining tools are listed here: http://dataminingtools.net/browse.php

Datakid
A: 

you can take a look at data mining tool, weka

Here is a link to a collection of tutorials and videos on WEKA Tutorials:http://www.dataminingtools.net/browsetutorials.php?tag=weka 

Videos: http://www.dataminingtools.net/videos.php?id=6 

datakid
+2  A: 

You should also check out Apache Mahout . It can be quite useful for some large-scale machine learning tasks such as user clustering.

random.bit
+1  A: 

Have a look at list of Open Source software's for Machine learning maintained by JMLR. you can find it here:

http://mloss.org/software/

http://jmlr.csail.mit.edu/mloss/

They represent State of Art!

My issue with Weka is that a number of algorithms in it are outdated.

WeShallOvercome
A: 

I believe KNIME deserves to join this list as well.

radek
A: 

i believe RapidMiner is an excellent tool that should be added to this list.

mariana soffer