Any examples, tips, guidance for the following scenario?
I have retrieved updates from several different news websites. I then analyse that information to predict on current trend in the world.
I could only find the information on data mining when searching for above idea, but it is for database systems. While data mining is similar to...
lets say i have a set of users, a set of songs, and a set of votes on each song:
=========== =========== =======
User Song Vote
=========== =========== =======
user1 song1 [score]
user1 song2 [score]
user1 song3 [score]
user2 song1 [score]
user2 song2 [score]
user...
Dear all,I am now using a webtool
http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=
to parse a webpage.
For example,we can parse newyorktimes homepage,we do:
http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=http%3A//www.nytimes.com/pages/world/index.html
in the address bar of our browser,it will parse things nicely for us.
...
I'm doing work on information extraction, and I need a tool to crawl data from web
page , is there a popular one in windows?
...
hi
I was wondering if anyone knows how to gather data from millions of people around the globe via these social networks in order to get the statistics. I need this for a project I'm trying to do and do not need to know the actual person posting such information (such as statuses, comments, information about them, etc) so as not to brea...
Hi,
Does anybody know an open-source\free library that does term clustering?
Thanks,
yaniv
...
Lets say I have 4 different values A,B,C,D with sets of identifiers attached.
A={1,2,3,4,5}
B={8,9,4}
C={3,4,5}
D={12,8}
And given set S of identifiers {1,30,3,4,5,12,8} I want it to return C and D. i.e. retrieve all sets from a group of sets for which S is a superset.
Is there any algorithms to perform this task efficiently (Prefer...
Hi all,
Can someone shed some light on how searching is done on web-sites like del.icio.us?
If I enter "js"(1), "javascript"(2) or "java script"(3) as my query on delicious, I'm pointed to resources about Java Script. However, depending on the query the returned result sets are different(del.icio.us system returns different set of boo...
I want to get a related [things/questions] in my app, similar to what StackOverflow does, when you tab out of the Title field.
I can think of only one way to do it, which i think might be fast enough
Do a search for the title in corpus of titles of all [things], and return first x matches. We can use whatever search is being used for ...
I have two structs like so:
public struct KeyLog
{
Keys key;
DateTime time;
}
public struct MouseLog
{
MouseEvents mouse;
Point coordinates;
DateTime time;
}
For every keyboard key press and mousebutton click I wish to save this data but I do not know which way would be the most efficient to store/handle it in? Ar...
As part of a research project I'm currently looking for open-source implementations of self-indexing algorithms, i.e. a compressed form of the traditional inverted index yielding nice characteristics such as faster lookup and/or less consumed space.
Do you know of any open-source implementations of self-indexing algorithms? Do you have ...
Can anyone explain me what is non-serializable in transaction DB. please give me an example. r1(x) r2(x)w1(y) c2 c1 is this non-serializable?
...
Hello Everyone,
I want to know what is the best open source java based framework for Text Mining, to use botg Machine Learning and dictionary Methods.
I'm using Mallet but there are not that much documentation and I do not know if it will fit all my requirements.
Thanks in advance.
Best Regards,
ukrania
...
I'm writing a piece of java software that has to make the final judgement on the similarity of two documents encoded in UTF-8.
The two documents are very likely to be the same, or slightly different from each other, because they have many features in common like date, location, creator, etc., but their text is what decides if they reall...
Consider the following search results:
Google for 'David' - 591 millions hits in 0.28 sec
Google for 'John' - 785 millions hits in 0.18 sec
OK. Pages are indexed, it only needs to look up the count and the first few items in the index table, so speed is understandable.
Now consider the following search with AND operation:
Goog...
I have a StackOverflow-like system where content is organised into threads, each thread having content of its own (the question body / text), and posts / replies.
I'm producing the ability to search this content via Lucene, and if possible I have decided I would like to index individual posts, (it makes the index easier to update, and m...
My problem is this one, I am using Sharepoint 2010, I have a form created in sharepoint designer 2010, above that form I have a silverlight webpart. Now I need to be able to access information from the silverlight webpart when I click on it and insert that information in the form below it.
Does anyone have any insight on how to do that?...
Hey everyone,
I am interested in doing some document clustering, and right now I am considering using TF-IDF for this.
If I am not wrong, TFIDF is particularly used for evaluating the relevance of a document given a query. If I do not have a particular query, how can I apply tfidf to clustering?
...
It's part of an information retrieval thing I'm doing for school. The plan is to create a hashmap of words using the the first two letters of the word as a key and any words with the two letters saved as a string value. So,
hashmap["ba"] = "bad barley base"
Once I'm done tokenizing a line I take that hashmap, serialize it, and append i...
In order to show a best match ad each time,there are at least these things to do:
retrieve the main information of the current page
get an ad that's related with the information retrieved above
But the above is almost impossible for a non-search-engine company.
So what's the practical way for a non-google company to approach a best ...