How do I write code that would find related (similar) articles to the one that the user is currently reading?
For example, suppose I have articles:
Python programming tips
Python programming for newbies
Programming in Python, ActionScript and Flash
Programming in the Jungle
Tarzan saves newbie Judy from using Fortran programming langua...
I'm trying to compute item-to-item similarity along the lines of Amazon's "Customers who viewed/purchased X have also viewed/purchased Y and Z". All of the examples and references I've seen are for either computing item similarity for ranked items, for finding user-user similarity, or for finding recommended items based on the current u...
Hi;
I have n documents and want to find common words that are included in these documents.
For example I want to say (n-3) documents include the word "web".
Certainly I can do this by basic data structures but there maybe efficient algorithm or a way to handle same words with different suffix.
Is there any algorithm for such purposes?...
Has anyone some experience about this?
...
The problem is described below:
Suppose I have a list of files in one version(say A,B,C,D). In the next version I have the following files(A,E,F,G). There are some similarities in their contents. The files in the later version comes from the previous version by file name renaming, content addition, deletion or partial modification or wi...
I'm wondering if there is a built in function in R that can find the cosine similarity (or cosine distance) between two arrays?
Currently, I implemented my own function, but I can't help but think that R should already come with one :)
Thanks,
Derek
...
Hi All, im doing an aplication with Lucene (im a noob with it) and im facing some problems.
My aplication uses the Lucene 2.4.0 library with a custom similaraty implementation (the jar is imported)
In my app im calculating doqFreq and numDocs manually (im adding the values of all indexes and then i calculate a global value in order to u...
My program uses clustering to produce subsets of similar items and then uses the cosine similarity measure as a method of determining how similar the clusters are. For instance if user 1 has 3 clusters and user 2 has 3 clusters then every cluster is compared against each other, 9 results using the cosine similarity measure will be produc...
Hello,
A part of a process requires to apply String Similarity Algorithms.
The results of this process will be stored and produce lets say SS_Dataset.
Based on this Dataset, further decisions will have to be made.
My questions are:
Should i apply one or more string similarity algorithms to produce SS_Dataset ?
Any comparisons...
Hello,
I am trying to determine document similarity between a single document and each of a large number of documents (n ~= 1 million) as quickly as possible. More specifically, the documents I'm comparing are e-mails; they are grouped (i.e., there are folders or tags) and I'd like to determine which group is most appropriate for a new...
My question is about this topic I've been reading about a bit. Basically my understanding is that in higher dimensions all points end up being very close to each other.
The doubt I have is whether this means that calculating distances the usual way (euclidean for instance) is valid or not. If it were still valid, this would mean that wh...
hi i m finding cosine similarity between documents ..i did like dis
D1=(8,0,0,1) where 8,0,0,1 are the tf-idf scores of the terms t1, t2, t3 , t4
D2=(7,0,0,1)
cos(theta) = (56 + 0 + 0 + 1) / sqrt(64 + 49) sqrt(1 +1 )
which comes out to be
cos(theta)= 5
now what do i evaluate from this value...i dont get it wat does cos(theta)=5 s...
hi i m finding similarity between documents ....nd to measure that i used jaccard coefficient...i did like dis
D1=(8,0,0,1) where 8,0,0,1 are the tf-idf scores of the terms t1, t2, t3 , t4
D2=(7,0,0,0)
jaccard coefficient= dotproduct(d1,d2) / |d1|+|d2|-dotproduct(d1,d2)
and the answer comes out to be " -1.367931 "...what does i...
Using Python, I'm computing cosine similarity across items.
given event data that represents a purchase (user,item), I have a list of all items 'bought' by my users.
Given this input data
(user,item)
X,1
X,2
Y,1
Y,2
Z,2
Z,3
I build a python dictionary
{1: ['X','Y'], 2 : ['X','Y','Z'], 3 : ['Z']}
From that dictionary, I generate a...
In the field of Data Mining, is there a specific sub-discipline called 'Similarity'? If yes, what does it deal with. Any examples, links, references will be helpful.
Also, being new to the field, I would like the community opinion on how closely related Data Mining and Artificial Intelligence are. Are they synonyms, is one the subset of...
Hi,
Is it possible to configure Solr so that the document similarity score would be in the range for example from 0 (no match) to 1 (complete document and query match).
Thanks!
...
What methods are there to get JPQL to match similar strings?
By similar I mean:
Contains: search string is found within the string of the matches entity
Case-insensitive
Small mispellings: e.g. "arow" matches "arrow"
I suspect the first two will be easy, however, I would appreciate help with the last one
Thank you
...
I am developing web application where users have collection of tags. I need to create a suggestion list for users based on the similarity of their tags.
For example, when a user logs in to the system, system gets his tags and search these tags in the DB of users and showing users who have similar tags. For instance if User 1 has followi...
I Have an object with a set of parameters like:
var obj = new {Param1 = 100; Param2 = 212; Param3 = 311; param4 = 11; Param5 = 290;}
On the other side i have a list of object:
var obj1 = new {Param1 = 1221; Param2 = 212; Param3 = 311; param4 = 11; Param5 = 290;}
var obj3 = new {Param1 = 35; Param2 = 11; Param3 = 319; param4 = 211; Pa...
There is a function similar_text() in the PHP library. The documentation (http://php.net/manual/en/function.similar-text.php) tells me that "This calculates the similarity between two strings as described in Oliver [1993]."
Despite extensive searching, I can't find the paper that "Oliver [1993]" is referring to; nor any candidate for w...