Are there any easy ways to implement filtering a user's input (possibly a question) by extracting the meaningful data in the query?
I basically want to filter out any noise words so I can send a 'clean' query to Google's search api.
...
Hi There,
is there a some out of the box workflow, which helps you to review blog comments in liferay.
Eg. Post blog entry -> User comments blog entry -> Mail goes to moderator about/with new entry -> User write s****S ;-) -> moderator denies to publish entry
or / and is there a way to check it against stopwords at submit. E.G. check f...
Hi,
I have two Xapian databases, let's call one "EN" and the other "DE", and let's say the former contains some documents in English, and the latter in German.
If I want users to be able to search both at once, I can easily load both of the databases. However, it seems like I can only use one stemmer and set of stop words?
There's no...
I have a SQL Server 2005 Full-text index with the language set to Neutral. I've edited the stop word list to remove single-digit numbers and rebuilt the index. If you search for any single digit (using contains), it ignores the number. 2 or more digit numbers work fine.
Any ideas?
...
Is it possible to get the list of Full Text Search noise/stop words from SQL Server 2005 by querying the database?
I am aware that the noise words are in a text file ~/FTData/noiseEng.txt but this file is not accessible to our application.
I've look at the sys.fulltext_* tables but these don't seem to have the words.
...
Hi,
I have a database in SQL Server 2008 with Full Text Search indexes. I have defined the Stopword 'al' in the Stoplist. However, when I search for any phrase with the keyword 'al', the word 'al' is still uesd in ranking.
This might be related to the fact that I am breaking up search terms, and reconstructing them. I am then searching...
Hello. For my upcoming social network site, I would like to stop participants from contributing tasteless content (text, pictures, videos, audio). I am devising a mechanism of moderation but I believe soon the amount of content being contributed will outgrow my teams capacity to proof-read. I am looking for ways to automatically handle t...
I am currently trying to develop a basic fulltext search for my website, and I noticed that certain words like "regarding" are listed as stopwords for MySQL fulltext searches. This doesn't bother me too much right now since people searching for a given news item wouldn't necessarily search using the word "regarding" (but I certainly can...
Hi all,
I am doing a "ALPHABETICAL ORDER SEARCH" module for a project.
that is it will look like
A B C D E F . . . . . . . . . .. . . . . . . .. . . . Z
When i click on "A" the results should be sort by "A". Which is same for all the alphabets.
Now my prob is as follows:
For example there is a film named "The Mummy".
What i do i...
Is there anyway to add some custom stop words to SQL Server 2005?
...
What I would like to do (in Clojure):
For example, I have a vector of words that need to be removed:
(def forbidden-words [":)" "the" "." "," " " ...many more...])
... and a vector of strings:
(def strings ["the movie list" "this.is.a.string" "haha :)" ...many more...])
So, each forbidden word should be removed from each string, a...
[Caveat] This is not directly a programing question, but it is something that comes up so often in language processing that I'm sure it's of some use to the community.
Does anyone have a good list of uninteresting (English) words that have been tested by more then a casual look? This would include all prepositions, conjunctions, etc... ...
I have written a query which will perform Full Text search using full search Index in mysql Table.
But my problem is that when user searches with "to go" then it will not search anything because of stopwords in mysql.
So my question is, how can I write a Full Search query which will ignore the stopwords?
...
I've a client testing the full text (example below) search on a new Oracle UCM site.
The random text string they chose to test was 'test only'. Which failed; from my testing it seems 'only' is a reserved word, as it is never returned from a full text search (it is returned from metadata searches).
I've spent the morning searching oracl...
Sorry if the question is bit confusing. This is similar to this question
I think this the above question is close to what I want, but in Clojure.
There is another question
I need something like this but instead of '[br]' in that question, there is a list of strings that need to be searched and removed.
Hope I made myself clear.
I...
Hello all ;)
does dismax support all the features like the standard requestHandler ? stopwords ? synonymes ? stemming ? did you hear about "edismax" ?
Im using Solr1.4
for my first tests of the stopwords , it doesn't work .. well i think so..
i configured my DisMax to match all terms if count terms=[1,2]
Example ( in french ) ...
I'm using C# to display a list of movie titles that I am calling from an SQLite database. Currently, I'm using a custom ListBox class that has a function to sort the text stripping the word 'The' from the beginning of every item. However, it doesn't exactly seem to be the simplest way to do it, since it calls from the SQLite database and...
Hi,
I have a somewhat large document and want to do stop-word elimination and stemming on the words of this document with "Python".Does anyone know an of the shelf package for these?
If not a code which is fast enough for large documents is also welcome.
Thanks
...
The method I am doing right now is breaking the string into array of words in NSSet and minus the set of stopwords. Is there a more efficient way?
...
Hello everyone,
I have a billion word corpus which I have collected in a scalar. I have a .regex file that contains all the stop words that I want to eliminate from my data (text).
I dont know how to use this .regex file so I have made an array and stored all the stop words of the .regex file in my stop word array.
To remove the stop...