I have a database in SQL server 2005 that originaly comes fom an old mainframe.
All relations was set in the surrounding software and there are non i the database.
I need to find the relations, not by field name but by actual contence in the registers.
(as suggestions, I realize I'l have to check them up)
It would be nice with some ext...
Hello,
I have a large database of resumes (CV), and a certain table skills grouping all users skills.
inside that table there's a field skill_text that describes the skill in full text.
I'm looking for an algorithm/software/method to extract significant terms/phrases from that table in order to build a new table with standarized skill...
Hi,
We have an architecture where we provide each customer Business Intelligence-like services for their website (internet merchant). Now, I need to analyze those data internally (for algorithmic improvement, performance tracking, etc...) and those are potentially quite heavy: we have up to millions of rows / customer / day, and I may w...
If you followed a DM course, which textbook was used?
I know about Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) and this poll. What did you effectively use?
...
Hi,
I have the input file contains large amount of transactions like
Transaction ID Items
T1 Bread, milk, coffee, juice
T2 Juice, milk, coffee
T3 Bread, juice
T4 Coffee, milk
T5 Bread, Milk
T6 Coffee, Bread
T7 Coffee, Bread, Juice
T8 Bread, Milk, Juice
T9 Milk, Bread, Coffee,
T10 Bread
T11 Milk
T12 Milk, Coffee, Bread, Juice
i wan...
We are looking at acquiring Data Mining software to primarily run predictive analysis processes.
How does SQL Server Data Mining solution compares to other solutions like SPSS from IBM?
Since SQL Server DM is included in SQL Server Enterprise license - what would be the justification to spend extra couple 100K to buy separate software ...
Short Desc:
I'm curious to see if I can use SQL Analysis services or some other SQL Server service to mine some data for me that will show commonalities between SQL TEXT fields in a dataset.
Long Desc
I am looking at a subset of data that consists of about 10,000 rows of TEXT blobs which are used as a notes column in a issue tracking ...
I'm downloading a long list of my email subject lines , with the intent of finding email lists that I was a member of years ago, and would want to purge them from my Gmail account (which is getting pretty slow.)
I'm specifically thinking of newsletters that often come from the same address, and repeat the product/service/group's name in...
I am trying to implement a naive bayseian approach to find the topic of a given document or stream of words. Is there are Naive Bayesian approach that i might be able to look up for this ?
Also, i am trying to improve my dictionary as i go along. Initially, i have a bunch of words that map to a topics (hard-coded). Depending on the occ...
Hi,
I'm doing a project for a college class I'm taking.
I'm using PHP to build a simple web app that classify tweets as "positive" (or happy) and "negative" (or sad) based on a set of dictionaries. The algorithm I'm thinking of right now is Naive Bayes classifier or decision tree.
However, I can't find any PHP library that helps me do...
Trying to install rattle on a windows server 2008 R2 64bit machine, using 64-bit R ver2.11, I got the following message:
install.packages("rattle", dependencies=TRUE)
Warning: dependencies ‘RGtk2’, ‘rggobi’, ‘RSvgDevice’, ‘Biobase’, ‘multicore’, ‘marray’, ‘affy’, ‘snowFT’, ‘Rmpi’, ‘rpvm’ are not available
When I tried to install one o...
I have a collection of binary strings of given size encoding effective solutions to a given problem.
By looking at them, I can spot obvious similarities and intuitively see patterns of symmetry and periodicity.
Are there mathematical/algorithmic tools I can "feed" this set of strings to and get results that might give me an idea of wh...
EDIT: I the size of the wordlist is 10-20 times bigger than I wrote down. I simply forgot a zero.
EDIT2: I will have a look into SVDLIBC and also see how to reduce a matrix to its dense version so that might help too.
I have generated a huge csv file as an output from my pos tagging and stemming. It looks like this:
word1, w...
I have over 1000 surveys, many of which contains open-ended replies.
I would like to be able to 'parse' in all the words and get a ranking of the most used words (disregarding common words) to spot a trend.
How can I do this? Is there a program I can use?
EDIT If a 3rd party solution is not available, it would be great if we can keep...
I have a data set with multiple layers of annotation over the underlying text, such as part-of-tags, chunks from a shallow parser, name entities, and others from various natural language processing (NLP) tools. For a sentence like The man went to the store, the annotations might look like:
Word POS Chunk NER
==== === ===== ...
In the field of Data Mining, is there a specific sub-discipline called 'Similarity'? If yes, what does it deal with. Any examples, links, references will be helpful.
Also, being new to the field, I would like the community opinion on how closely related Data Mining and Artificial Intelligence are. Are they synonyms, is one the subset of...
hello stackflow people
As a School assignment i'm required to implement Naïve Bayes algorithm which i am intending to do in Java.
In trying to understand how its done, i've read the book "Data Mining - Practical Machine Learning Tools and Techniques" which has a section on this topic but am still unsure on some primary points that are...
Hi,
I have a set of training data consisting of 20 multiple choice questions (A/B/C/D) answered by a hundred respondents. The answers are purely categorical and cannot be scaled to numerical values. 50 of these respondents were selected for free product trial. The selection process is not known. What interesting knowledge can be mined f...
Hello!
I need to develop a tool for web log data mining.
Having many sequences of urls, requested in a particular user session (retrieved from web-application logs), I need to figure out the patterns of usage and groups (clusters) of users of the website.
I am new to Data Mining, and now examining Google a lot.
Found some useful info,...
Hi,
I'm trying to understand bayesian network. I have a data file which has 10 attributes, I want to acquire the confusion table of this data table ,I thought I need to calculate tp,fp, fn, tn of all fields. Is it true ? if it's then what i need to do for bayesian network.
Really need some guidance, I'm lost.
...