views:

313

answers:

6

I can find the technical explanation of what data mining is in a book or on Wikipedia, but I'm wondering what sort of development does it exactly involve? Is it more about using tools or more about writing tools? Is it really any much different from other domains when it comes to R&D?

A: 

Data Mining as I say is finding patterns or trends from given data. A developer perspective might be in applications like Anti Money Laundring... Where given a pattern you will search data for that given pattern. One other use is in Projection Softwares... where you project a result or outcome in future against a heuristic by studying recognizing the current trend from data.

S M Kamran
+1  A: 

Data mining is about searching large quantities of data for hidden patterns. Web 2.0 example: News corp uses its site myspace.com as a large data mine to determine what movies and products to promote. They write software to identify trends in the data that it's users post to the site. News corp does this to gather information useful for advertising campaigns and market predictions. It's different from other domains of R&D in that from a data givers perspective its passive. Rather than going out on the street and asking people in person what movies they are likely to see this summer and other such questions, the data mining tools sort out these things by analyzing data given by users voluntarily.

Wikipedia actually does have a pretty good article on it: - http://en.wikipedia.org/wiki/Data_mining

shit a birck
+3  A: 

In my experience (I'm a former data miner :-)), it's a mixture of using tools and writing tools. A lot of the time, the tools you need to analyse the particular data set don't exist, so you have to write them yourself first. It can be very interesting but you often need quite a different approach to the sort of programming I do now (embedded wireless), for example.

Vicky
+1  A: 

I think it's more about using off the shelf tools rather than developing your own. An academic example of that kind of tools might be WEKA. Of course, you still have to know what algorithms use, how to preprocess data (very important this part), etc.

In R&D I don't have much idea, but it should be like almost everything: maths, statistics, more maths...

fortran
A: 

On the development level, data mining is just another database application, but with a huge amount of data.

The mining itself is done by running specific queries on the database. It's in the creation of the queries where the important work is done. They of course depend on the data model, and on the hypotheses, what sort of trends the customer expects to find. Therefore, the fine tuning of the queries usually can't be done in development, but only once the system is live and you have live data. Then the user can test his hypotheses and adapt the queries to show him the trends he is looking for.

So from a dev point of view, data maining is about

  1. Managing large sets of data in your client (one query may return 100.000 rows of data)

  2. Providing the user (who may know nothing about SQL or relational databases in general) with an effective way to modify his queries and view the results.

Treb
+1 That's what I'm actually doing, and couldn't have said this was data mining. Good explanation! Thanks!
Will Marcouiller
Clustering, Classification, Anomaly Detection, Similarity Measurement, etc aren't done by just "querying" the data and "adapting" those queries. I disagree.
colithium
@colithium: By which other means *are* they done, then? As stated in my response to ybakos' answer, my answer lacks any reference to data analysis methods, true. But I don't see how the first step in data mining can be anything else but accessing the data, which is usually done through queries. And this is where I see potential technical difficulties that the developer of a DM app should keep in mind.
Treb
Sure to be used it must be accessed, I agree with you. But that's not the essence of data mining. That's like prefacing every answer on SO with "you need to access RAM to solve your problem". I'm not trying to be glib; data mining is about developing and/or choosing techniques to identify patterns in your vast data set. It's not about querying for summary stats or interesting joins.
colithium
+8  A: 

Why the hell is Treb's answer checked, it's completely (as in 100%) wrong.

Data Mining is the process of discovering interesting patterns in large amounts of data. It is not querying data, which is just what user Treb describes.

To understand DM from a developer's perspective, you should read the book Programming Collective Intelligence by Toby Segaran.

ybakos
Can't say that I agree with you - How would you discover any pattern in your data without querying first? Querying is the first step, therefore it's the first thing a developer has to think about. I admit that I completely forgot to mention any data analysis - statistics are certainly a must for any data mining application, as well as visual representation of large data sets. But **performing** an analysis is done by a data miner, not the developer. The OP was asking about data mining from a dev's POV, so that's what I tried to answer.
Treb
"How would you discover any pattern in your data without querying first?" you ask. You discover patterns in your data by programmatic implementation, not by fishing with queries. This is the whole point -- getting the machine to detect the patterns in the data.
ybakos
And in order to detect pattern programmatically, you first need to look at the data. So in the end it comes down to queries, no matter if who is doing the querying.
Treb