data-mining

How to use DataMining feature of SQL Server 2008 with ASP.Net

How to use DataMining feature of SQL Server 2008 with ASP.Net ...

What are some good resources for learning data mining?

I'd like to get fluent enough in the data mining domain to be able to use the features of the Orange framework to classify photos of fish. ...

What are some good resources on DataMining?

What are some good resources for grokking the uses and theories behind DataMining? ...

Well explained algorithms for indexing and searching in metric spaces

I need to implement some kind of metric space search in Postgres(*) (PL or PL/Python). So, I'm looking for good sources (or papers) with a very clear and crisp explanation of the machinery behind these ideas, in such way that I can implement it myself. I would prefer clarity over efficiency. (*) The need for that is described better he...

Java HTML Parsing

Hello everyone. I'm working on an app which scrapes data from a website and I was wondering how I should go about getting the data. Specifically I need data contained in a number of div tags which use a specific CSS class - Currently (for testing purposes) I'm just checking for "div class = "classname"" in each line of HTML - This wor...

Datamining open source software alternatives

I am evaluating datamining packages. I have find these two so far: RapidMiner Weka Do you have any experience to share with these two products, or any other product to recommend me? Thanks ...

Finding a pattern in a set

What algorithms could i use to determine common characters in a set of strings? To make the example simple, I only care about 2+ characters in a row and if it shows up in 2 or more of the sample. For instance: 0000abcde0000 0000abcd00000 000abc0000000 00abc000de000 I'd like to know: 00 was used in 1,2,3,4 000 was used in 1,2,3,...

Comparing multiple dictionaries in Python

I'm new to Python and am running to a problem I can't google my way out of. I've built a GUI using wxPython and ObjectiveListView. In its very center, the GUI has a list control displaying data in X rows (the data is loaded by the user) and in five columns. When the user selects multiple entries from the list control (pressing CTRL or s...

Is there any web service to get the weather data for the cities all round the world [over a period of time like one year]?

Any web service to get the monthly min/max temperatures for cities over a period of time? ...

Where can I get databases of cities/places around the world?

In dopplr [http://www.dopplr.com] there is an option to fill the city of travel and the site will automatically find the city around the world. Is there any web service or database for such a city lookup? ...

How to find "equivalent" texts?

I want to find (not generate) 2 text strings such that, after removing all non letters and ucasing, one string can be translated to the other by simple substitution. The motivation for this comes from a project I known of that is testing methods for attacking cyphers via probability distributions. I'd like to find a large, coherent plai...

Techniques for building recommendation engines?

The book Programming Collective Intelligence presents a technique for computing similar links/users based on the distance between the links/users in a huge metric space (user x bookmarked this link / link x was bookmarked by this user). What other techniques have been developed for recommendation engines? ...

Can someone please explain data mining, SSIS, BI, ETL and other related technologies?

I was talking with a co-worker yesterday regarding a situation where he used SSIS (or something like that) to do some really cool thing with an SSIS Package where he passed in a name like "Dr. Reginald Williams, PhD." and based on some weighting scheme the system was smart enough to figure out how to tokenize it and store it in the datab...

What data mining application to use?

The last I used was weka . The last I heard java was coming up with an API (JDM) for it. Can anyone share their experiences with the tools. I am mostly interested in using the tools for classification/clustering (weka does a decent job here) and the tool should have good API support. ...

Best XML format for log events in terms of tool support for data mining and visualization?

We want to be able to create log files from our Java application which is suited for later processing by tools to help investigate bugs and gather performance statistics. Currently we use the traditional "log stuff which may or may not be flattened into text form and appended to a log file", but this works the best for small amounts o...

Processing web feed multiple times a day

Ok, here is in brief the deal: I spider the web (all kind of data, blogs/news/forums) as it appears on internet. Then I process this feed and do analysis on processed data. Spidering is not a big deal. I can get it pretty much in real time as internet gets new data. Processing is a bottleneck, it involves some computationally heavy algor...

Using Datamining/Statistics for Log Monitoring

I have a large set of log files that I want to characterize or possibly add some kind of decision tree or some kind of analytics. But I don't know exactly what. What kind of analysis have you done with log files, a lot of log files. For example, so far I am collecting how many requests are made to a particular page for a given log fil...

How can you extract all 6 letter Latin words to a list?

I need to have all 6 letter Latin words in a list. I would also like to have words which follow the pattern Xyzzyx in a list. I have used little Python. ...

C# Parsing a webpage's source

Among the wall of text that is a pages source; I need to get the video_id,l and t without the quotes so for a section like this. "video_id": "lUoiKMxSUCw", "l": 105, "sk": "-2fL6AANk__E49CRzF6_Q8F7yBPWdb9QR", "fmt_map": "35/640000/9/0/115,34/0/9/0/115,5/0/7/0/0", "t": "vjVQa1PpcFMbYtdhqxUip5Vtm856lwh7lXZ6lH6nZAg=", i need the following...

Screen-scraping of a proprietary website for academic use

A client of mine who is a social sciences researcher at a university is asking if I can write a spider to do statistical data mining from a subscription-only academic database. He would like to use the statistics for his academic research. (For those interested, this would involve downloading thousands of text documents and then doing l...