mining

Opinion Mining - What Database Type?

Hi guys, I am entering a project to make a Opinion Mining (Data Mining -> Web Mining -> Opinion Mining) to get semantic orientation of the words contained. We will use a crawler to get the pages opinion. Now the question is, what type of DataBase should I use (OO, Relational, hierachycal, etc), is best to use in this type of project. ...

ad hoc query tool patterns

Hi, all. I'm looking for common patterns of implementing ad-hoc querying capabilites graphically. I've looked at SQL query builders in Access and TOAD, but I'm interested if anyone is aware of products that have build such a tool against a domain specific data warehouse (e.g. clinical databases). Thanks, ...

Web mining -classification algorithms

Hi, my senior project is determining the dominant category of a web page.I crawled dmoz. now i am trying to build arff. After that i will use some feature extraction methods and classification algorithms. Do you know which feature extraction method performs good with any classification algorithm for web mining? ...

Good metadata image dump utilities?

I'm looking for the best tool out there to extract any and all metadata embedded within the most populat image file formats (JPEG and PNG specifically). whatever's in there, I'd like to know about it (XMP, Exif, IPTC, IIM, etc.). Ideally I'm looking for an all-in-one solution that I can run from a command line, but I'm interested to hear...

How to sell collective intelligence/content mining services?

Hi, all. I've been recently interested in and learning many collective intelligence, semantic web programming, mashups, data mining (just the data warehousing aspect), and content mining approaches and was wondering how to turn such passion/interest into a startup? Are there examples of startups whose core business relies on such servic...

Techniques to display related content or articles

Hi I've been trying to learn Text mining and other related things in Collective Intelligence field. I am interested to make an app which will scan thru the document and show related posts/articles on page. What algorithm(s) would be helpful to retrieve required info? Thanks /A ...

Information Extraction Toolkits

I'm looking for information extraction libraries where I can have semi structured information that may have either hidden or incomplete data. I want to train some classifiers to pull out content based on the structure. I'm working on building a tool where I can select text in the browser, and it will generate (via some web service call)...

DMX Analysis Services question

Hi, I am have two mining models, both are time series. One is [Company_Inputs] and the other is [Booking_Projections]. What I want to do is use EXTEND_MODEL_CASES to join the results of [Company_Inputs] as the extended cases. So basically something like: Select Flattened PredictTimeSeries([Bookings], 1, 6, EXTEND_MODEL_CASES) FROM [B...

Data Mining in Excel 2007 using SQL Server 2008

Hi, I was working on reporting and analysis using the Data Mining Add ins for Excel 2007 using SQL Server 2008. Does any one have any idea if there is something comparable/similar which can be used for data analysis and reporting provided by Oracle?? Let me know if you do and a link would be very helpful :) Thank you, J ...

algorithm to calculate similarity between texts

Hello all, I am trying to score similarity between posts from social networks, but didn't find any good algorithms for that, thoughts? I just tried Levenshtein, JaroWinkler, and others, but those one are more used to compare texts without sentiments. In posts we can get one text saying "I really love dogs" and an other saying "I really...

N-gram related question - C# algorithm

Hi, I am intending to use the n-gram part/algorithm of this code: http://www.codeproject.com/KB/cs/tfidf.aspx The algorithm produces these tri-gram results: t th the he e q qu qui uic ick ck k r re red ed d for: the quick red However, this source: http://en.wikipedia.org/wiki/Trigram reckons it should be: the qui k_r he_ u...