data-mining

Clasification Tree Plug-in Algorithm for SQL Server 2008

I am facing a Data Mining problem and I'm forced to use SQL Server with Analysis Server. I have to implement CHAID algorithm in a way that once It is done, it'll be included in the Analysis Services available set of algorithms. I want to use SQL Server Data Mining Managed Plug In. So I'm programming with C#. The documentation I have ava...

Web mining -classification algorithms

Hi, my senior project is determining the dominant category of a web page.I crawled dmoz. now i am trying to build arff. After that i will use some feature extraction methods and classification algorithms. Do you know which feature extraction method performs good with any classification algorithm for web mining? ...

Intelligent Database - Capable of identifying out of the ordinary values

I am looking for a tool or system to take a look at the database and identify values that are out of the ordinary. I don't need anything to do real time checks, just a system which does processing overnight or at scheduled points. I am looking for a system at two levels: Database wide: Eg: Compare salaries of all employees and identify...

Most representative instance of a cluster

After performing a cluster analysis to my dataset (a dataframe named data.matrix), I added a new column, named cluster, at the end (col 27) containing the cluster name that each instance belongs to. What I want now, is a representative instance from each cluster. I tried to find the instance having the smallest euclidean distance from t...

Retail knowledge inference

So i am doing a research on how can i infer knowledge from reports (not with a specific format), but after pre processing, i should have some kind of formatted data. A fairly basic inference would be: "Retailer has X stock." and "X is sellable." -> "Retailer sells X" the knowledge i focus is retail domain oriented, and if possible i sho...

text mining library or lingual library ?

i have a bunch of data harvested from a forum I own, and would like to do some text mining or use some linguistic library to extract useful information. any text mining, data mining library in any language will do. Thank you. ...

Data Mining with Sql Server

I have 2 questions about data mining: I have the concepts of this topic But I want to know more about it as I know we can use data mining to find patterns from our database like the common example (sugar and tea:the majority of people buy sugar with tea),but if i want to use data mining technique from another view : I mean I hav...

Machine learning issue for negative instances

Hello. I had to build a concept analyzer for computer science field and I used for this machine learning, the orange library for Python. I have the examples of concepts, where the features are lemma and part of speech, like algorithm|NN|concept. The problem is that any other word, that in fact is not a concept, is classified as a concept...

Why isn't my Data Mining Model Training destination accepting numeric input inside SSIS?

I'm trying to create a mining model for forecasting against some DW data. I'm using SSIS for my ETL, and trying to use the Data Mining Model Training destination. Unfortunately I'm receiving an error whenever the column I'm trying to predict is numeric or decimal format. I don't get the error when I create the model by hand in SSMS, a...

What technique would you use to find similar people with the same social profile as you? (computer science)

Let's take your Facebook social profile. There are interests, activities, movies, music, and tv-shows. You have these 5 things, in text, of course. Given your social profile and 10 other people, we want to find overlaps, similarity, etc. What method would you use to do it? I"m guessing it would be best to use vectors and Euclidean/Pe...

extracting useful data from arbitary html pages ?

is there a library for ruby or php that is able to parse html pages and extract unique data by comparing it with other similar pages....should use some sort of text mining to identify which texts are more likely noise and repetivie, while other texts are more unique and useful... ...

How to sell collective intelligence/content mining services?

Hi, all. I've been recently interested in and learning many collective intelligence, semantic web programming, mashups, data mining (just the data warehousing aspect), and content mining approaches and was wondering how to turn such passion/interest into a startup? Are there examples of startups whose core business relies on such servic...

Data mining algorithms comparison

Are there any data mining algorithms comparisons? Comparisons in terms of performance, accuracy and the required amount of data for generating the robust model. It seems that ensemble learning algorithms like bagging and boosting are considered to be the most accurate at this moment. I don't have any specific problem to solve. It's just ...

Data browsing/queriying tool for loosely structured data

I have a set of statistical data (about 100M size), which is organized in key-value pairs, some of the values are just numbers (e.g. like person's age or weight) and some are hierarchical (e.g. like person's employments - it can have a set of employment records, each again containing key/value pairs, etc.). The real data is not exactly t...

Amazon products information mining

Hello, I'm new to this "information mining". So I am wondering is there an API that will let me get needed information about products from Amazon's web site? Or, if there isn't, how would you do that? Maybe any suggestion/reference to some technology which can do this? Thanks in advance for sharing. ...

Monitor brands with common words

Let's say you should monitor the brand "ONE" online. What algorithms can be used to separate pages about the brand ONE from pages containing the common word ONE? I'm thinking maybe Bayes could work, but are there other ways to do this? ...

Techniques to display related content or articles

Hi I've been trying to learn Text mining and other related things in Collective Intelligence field. I am interested to make an app which will scan thru the document and show related posts/articles on page. What algorithm(s) would be helpful to retrieve required info? Thanks /A ...

Front end applications for examining/digging through a SQL Analysis Services Mining Model.

I currently use Excel and SQL Server Business intelligence studio to browse my models, but I've been searching high and low for a decent, moderately user-friendly front end application that can be used for trudging through a SSAS Mining Model. I understand how to use the predictions for specific purposes (such as integrating with call q...

WEKA Tutorials / Examples for a Newbie

In a follow-up to this answer I want to ask if any of you know any good (and more importantly easy to understand) tutorials and / or examples of data mining with the Weka toolkit. I've been very interested in Data Mining ever since I've first heard of it and the things it can do, I've also have some experiments I'd like to do with some ...

Algorithms: data binarization

I have a huge dataset with words word_i and weights weight[i,j], where weight is the "connection strength" between words. I'd like to binarize this data, but I want to know if there is any existing algorithm to make binary code of each word in such a way that the Hamming distance between the codes of the words correlates with this weig...