data-mining

Languages for implementing decision trees

What would be a good choice of programming language in which to implement a decision tree? The results of the implementation will be for personal use only, so no need to consider ability to publish etc. I have heard that Octave is a good option, can anyone explain why a matrix based language is recommended for implementing decision tree...

Development Platforms for Financial modeling (What do the Quants use?)

Quantitative Analysts or "Quants" predict the behavior of markets to maximize profits. I am interested in the software that they use to accomplish this. Are there development platforms, libraries, languages or Data Mining suites specifically tailored to Financial Modeling? ...

Ways to calculate similarity

Hi I am doing a community website that requires me to calculate the similarity between any two users. Each user is described with the following attributes: age, skin type (oily, dry), hair type (long, short, medium), lifestyle (active outdoor lover, TV junky) and others. Can anyone tell me how to go about this problem or point me to s...

are there any useful datasets available on the web for data mining?

Hi, Does anyone know any good resource where example (real) data can be downloaded for experimenting statistics and machine learning techniques such as decision trees etc? Currently I am studying machine learning techniques and it would be very helpful to have real data for evaluating the accuracy of various tools. If anyone knows any...

Retrieivng coordinates in this page

Hey guys, Im trying to do some data mining and analyze data based on locations. For this site, http://www.dianping.com/shop/1898365 I am trying to figure out whats the latitude and longitude by crawling. But I cant seem to figure out where this information is stored. Can someone give me some pointers ...

Rare Event Detection

Is there any good reference to Algorithms that people use for rare event detection ? Also, How is the time factor taken into account ? If i have a case where successive data points tell something (t_1 to t_n) , How can one factor this into normal Machine learning scenario ? Any pointer will be appreciated. ...

Data mining google's web search results?

Currently, i have a google web search. If a user searches starbucks, I would only want to retrieve the company or product information, not some other weird links like blog pages, using javascript, is it possible to do so? if yes, how am i able to do it? Kind of a newbie in the data mining part..thanks! Added my coding for download for c...

what is the difference between Association rule mining & frequent itemset mining

i am new to data mining and confuse about Association rules and frequent item mining. for me i think both are same but i need views from experts on this forum My question is what is the difference between Association rule mining & frequent itemset mining? Thanks ...

SSAS data mining web viewer

Hi, I need to allow my end users to view SQL Server Analysis Services data mining model (to be exact Association finding). I'm looking for a tool which can do the job. For the cubes I'm using Excel OWC, and I'm quite satisfied. So far I had found only DM Companion . But I'm struggling to find anything else. Can you recommend somethi...

How to calculate Mahalanobis distance between two time series of equal dimensions?

I am doing some data-mining on time series data. I need to calculate the distance or similarity between two series of equal dimensions. I was suggested to use Euclidean distance, Cos Similarity or Mahalanobis distance. The first two didn't give any useful information. I cannot seem to understand the various tutorials on the web. So, Gi...

clustering on very large sparse matrix?

Hello again, I am trying to do some (k-means) clustering on a very large matrix. The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row). I want to get around 2000 clusters. I got two questions: - Can someone recommend an open source platform or tool for doing that (maybe using k-means...

Datamining? and how can I perform it on my website ?

Hi I’m preparing my graduation project from computer science, I made this website and its running perfectly but my supervisor requested me to apply data mining on the website. But I don’t understand what I should do. The website is a social network, each user will have a profile and blog and access to some e-books that required you to be...

How to use R Random forests to reduce attributes having no discrete classes?

I want to use Random forests for attribute reduction. One problem I have in my data is that I don't have discrete class - only continuous, which indicates how example differs from 'normal'. This class attribute is a kind of distance from zero to infinity. Is there any way to use Random forest for such data? ...

Scalable Classifier For Finding Missing Attributes

I have a large sparse matrix representing attributes for millions of entities. For example, one record, representing an entity, might have attributes "has(fur)", "has(tail)", "makesSound(meow)", and "is(cat)". However, this data is incomplete. For example, another entity might have all the attributes of a typical "is(cat)" entity, but i...

Probabilistic Generation of Semantic Networks

I've studied some simple semantic network implementations and basic techniques for parsing natural language. However, I haven't seen many projects that try and bridge the gap between the two. For example, consider the dialog: "the man has a hat" "he has a coat" "what does he have?" => "a hat and coat" A simple semantic network, based...

Algorithms and methods for attribute/feature selection?

I have data with continuous class and I'm searching for good methods to reduce number of attributes. Now I'm using correlation based filters, random forests and Gram–Schmidt algorithm. What I want to achieve is answer which attributes are more important/relevant to class attribute than others. By using methods that I mentioned befor...

Interactive Decision Tree Classifier

Can anyone recommend a decision tree classifier implementation, in either Python or Java, that can be used incrementally? All the implementations I've found require you to provide all the features to the classifier at once in order to get a classification. However, in my application, I have hundreds of features, and some of the features...

How to get HTML data with javascript

Hello, I have an HTML web page full of divs and span tags identified with class that have lots of data I need in other format. I was wondering what would be the best way to do this with javascript. Thank you for the help. ...

Asking for a method used in data mining(especially for blog webpages)

Hello there, recently I attended a talk on data-mining,and I missed some points by the lecturer,which is about a technique used on data-mining,and which is especially useful for blog webpages. I think I sort of remembered the term is named as "td/tdf" something,but really not sure. I googled for this for a while,still have no result. I...

Can I reformulate this MDX query to use sets instead of an "And"?

with member [Measures].[BoughtDispenser] as Sum(Descendants([Customer].[Customer].CurrentMember, [Customer].[Customer]), Iif( (IsEmpty(([Item].[ItemNumber].&[011074], [Measures].[Sale Amount])) And IsEmpty(([Item].[ItemNumber].&[011069], [Measures].[Sale Amount])) ) Or IsEmpty([...