data-mining

How to design a star schema

I am confusing where should I start to design a star schema. for example I have tables in database as follows: Branch(branchNo, bStreetAddress, bCity) LoanManager(empNo, empName, phone, branchNo) Customer(custNo, custName, profession, streetAddress, city, state) Account(accNo, accType, balance, accDate, custNo) LoanContract(contractNo,...

Monte Carlo simulation in forecasting?

I am a physicist. Also i have some information about Monte Carlo simulation. i want to learn financial forecasting with Monte Carlo. Do you have any idea? What do you think financial decisions programming ? How is the future of financial software with Monte Carlo Simulation? ...

Data Mining: Where to start?

Hi, I am developing a service for determining human movement upon the acceleration measurements and I would like to start with the research of data mining and try to apply techniques on the service. As I am not familiar with the field of data mining I would really appreciate if anyone could recommend any good literature to get some the...

Detecting Correlated Columns in Data

Suppose I have the following data: OrderNumber | CustomerName | CustomerAddress | CustomerCode 1 | Chris | 1234 Test Drive | 123 2 | Chris | 1234 Test Drive | 123 How can I detect that the columns "CustomerName", "CustomerAddress", and "CustomerCode" all correlate perf...

How to store many years worth of 100 x 25 Hz time-series - Sql Server or timeseries database

I am trying to identify possible methods for storing 100 channels of 25 Hz floating point data. This will result in 78,840,000,000 data-points per year. Ideally all this data would be efficiently available for Web-sites and tools such as Sql Server reporting services. We are aware that relational databases are poor at handling time-ser...

Extract small relevant bits text (as Google does) from the full text search results.

I have implemented a full text search in a discussion forum database and I want to display the search results in a way Google does. Even for a very long html page only a two or three lines of the texts displayed in a search result list. Usually these are the lines which contain a search terms. What would be the good algorithm of how to...

What do I do with this csv dataset I just downloaded from dbpedia?

I just downloaded this csv of infoboxes of wikipedia from dbpedia. However I have no idea how to use it :-S I want to import all this data into a database but am not so sure how to take it from here. I downloaded it from http://wiki.dbpedia.org/Downloads32#infoboxes I'm working in Php Just for the record - this csv file is around 1.8...

Wikipedia integration issue - need to finally sort this out 101

Sorry guys, I've been running a mock asking questions on how to integrate wikipedia data into my application and frankly I don't think I've had any success on my end as I've been trying all the ideas and kinda giving up when I read a dead end or obstacle. I'll try to explain what exactly I am trying to do here. I have a simple directory...

Obtaining financial data from Google Finance which is outside the scope of the API

Google's finance API is incomplete -- many of the figures on a page such as: http://www.google.com/finance?fstype=ii&q=NYSE:GE are not available via the API. I need this data to rank companies on Canadian stock exchanges according to the formula of Greenblatt, available via google search for "greenblatt index scans". My questio...

Unable to find an internet page blocked by robots.txt

Problem: to find answers and exercises of lectures in Mathematics at Uni. Helsinki Practical problems to make a list of sites with .com which has Disallow in robots.txt to make a list of sites at (1) which contain files with *.pdf to make a list of sites at (2) which contain the word "analyysi" in pdf-files Suggestions for practical...

What are the best resources for learning how to implement Naive Bayes Classifiers in SSAS?

After asking this question, I've decided to try and implement some Naive Bayes Classifiers using SQL Server Analysis Services. Can anyone point me to a decent book, website or any other resource on how to implement Naive Bayes Classifiers in SSAS? Similarly, I would be interested in learning about Decision Trees. ...

smoothing irregularly sampled time data

Given a table where the first column is seconds past a certain reference point and the second one is an arbitrary measurement: 6 0.738158581 21 0.801697222 39 1.797224596 49 2.77920469 54 2.839757536 79 3.832232283 91 4.676794376 97 5.18244704 100 5.521878863 118 6.316630137 131 6.778507504 147 7.020395216 157 7.331607129 176 7.63749222...

mysql search prepending "the" or "and/&" ambiguity

I'm trying to do a title search in mysql across two different databases to match up data from seperate sources. In both db1 or db2, the titles will sometimes start with "The first title" in one db, and just "first title" in the other db, or "far and away" vs "far & away". Mysql fulltext search doesn't seem very effective at figuring th...

The business of Artificial Intelligence

I'm putting together a presentation aimed towards entrepreneurs on the present state of industrial AI development, titled "The business of AI"; however, what little resources I have found on Google seems awfully outdated. So I turn to the nice folks on Stackoverflow: Of the present day used systems, which products do you consider good b...

Datamining and Business Intelligence Technologies

I've noticed an increasing number of jobs that are asking for experience with datamining and business intelligence technologies. This sounds like an incredibly broad topic but where would one go if they wanted to develop at least a partial understanding of this stuff if it were to come up in an interview? ...

What is data mining from a developer's perspective?

I can find the technical explanation of what data mining is in a book or on Wikipedia, but I'm wondering what sort of development does it exactly involve? Is it more about using tools or more about writing tools? Is it really any much different from other domains when it comes to R&D? ...

How do I data mine various news sources?

I'm working on a free web application that will analyze top news stories throughout the day and provide stats. Most news websites offer RSS feeds, which works fine for knowing which stories to retrieve. However, the problems arise when attempting to get the full news story from the news website itself. At the moment, I have separate News...

What is Java Data Mining, JDM?

I am looking at JDM. Is this simply an API to interact with other tools that do the actual data mining? Or is this a set of packages that contain the actual data mining algorithms? ...

Is there a good method to get up-to-date financial data as a stream to feed an application?

I'm pretty sure no one has ever written an application to analyze financial data (sarcasm). Regardless, I'm considering writing one for fun and need a way to access (1) large amounts of historical data and (2) real-time fluctuations in stock prices etc... (my finance jargon is weak). Is there an API (free or pay) that I can hook into t...

Calculating item counters for a set of selected categories.

In our Ruby on Rails project we have a lot of categorization criteria for recipes, such as cook method, occasion etc. Every recipe belongs to one or several of these categories. When someone starts browsing for recipes, he/she can narrow down to a set of particular categories. Then we need to calculate the number of recipes in all catego...