stop-words

Filter out common words for search query

Are there any easy ways to implement filtering a user's input (possibly a question) by extracting the meaningful data in the query? I basically want to filter out any noise words so I can send a 'clean' query to Google's search api. ...

liferay workflow to preview / review blog comments

Hi There, is there a some out of the box workflow, which helps you to review blog comments in liferay. Eg. Post blog entry -> User comments blog entry -> Mail goes to moderator about/with new entry -> User write s****S ;-) -> moderator denies to publish entry or / and is there a way to check it against stopwords at submit. E.G. check f...

Xapian multiple-language searching with stop words?

Hi, I have two Xapian databases, let's call one "EN" and the other "DE", and let's say the former contains some documents in English, and the latter in German. If I want users to be able to search both at once, I can easily load both of the databases. However, it seems like I can only use one stemmer and set of stop words? There's no...

SQL Server Full-text Search Discarding Single Digit Numbers

I have a SQL Server 2005 Full-text index with the language set to Neutral. I've edited the stop word list to remove single-digit numbers and rebuilt the index. If you search for any single digit (using contains), it ignores the number. 2 or more digit numbers work fine. Any ideas? ...

Query SQl Server 2005 Full Text Search noise/stop words

Is it possible to get the list of Full Text Search noise/stop words from SQL Server 2005 by querying the database? I am aware that the noise words are in a text file ~/FTData/noiseEng.txt but this file is not accessible to our application. I've look at the sys.fulltext_* tables but these don't seem to have the words. ...

Full Text Search: Noise words are being searched for

Hi, I have a database in SQL Server 2008 with Full Text Search indexes. I have defined the Stopword 'al' in the Stoplist. However, when I search for any phrase with the keyword 'al', the word 'al' is still uesd in ranking. This might be related to the fact that I am breaking up search terms, and reconstructing them. I am then searching...

Looking for ways to prevent users from contributing tasteless content.

Hello. For my upcoming social network site, I would like to stop participants from contributing tasteless content (text, pictures, videos, audio). I am devising a mechanism of moderation but I believe soon the amount of content being contributed will outgrow my teams capacity to proof-read. I am looking for ways to automatically handle t...

MySQL Fulltext Stopwords Rationale

I am currently trying to develop a basic fulltext search for my website, and I noticed that certain words like "regarding" are listed as stopwords for MySQL fulltext searches. This doesn't bother me too much right now since people searching for a given news item wouldn't necessarily search using the word "regarding" (but I certainly can...

How to omit "THE" in search using PHP and MYSQL

Hi all, I am doing a "ALPHABETICAL ORDER SEARCH" module for a project. that is it will look like A B C D E F . . . . . . . . . .. . . . . . . .. . . . Z When i click on "A" the results should be sort by "A". Which is same for all the alphabets. Now my prob is as follows: For example there is a film named "The Mummy". What i do i...

Custom StopWord List In SQL Server 2005 Full-Text-Search

Is there anyway to add some custom stop words to SQL Server 2005? ...

How to remove list of words from strings

What I would like to do (in Clojure): For example, I have a vector of words that need to be removed: (def forbidden-words [":)" "the" "." "," " " ...many more...]) ... and a vector of strings: (def strings ["the movie list" "this.is.a.string" "haha :)" ...many more...]) So, each forbidden word should be removed from each string, a...

List of uninteresting words

[Caveat] This is not directly a programing question, but it is something that comes up so often in language processing that I'm sure it's of some use to the community. Does anyone have a good list of uninteresting (English) words that have been tested by more then a casual look? This would include all prepositions, conjunctions, etc... ...

How can I write full search index query which will not consider any stopwords?

I have written a query which will perform Full Text search using full search Index in mysql Table. But my problem is that when user searches with "to go" then it will not search anything because of stopwords in mysql. So my question is, how can I write a Full Search query which will ignore the stopwords? ...

Where can I find a list of 'Stop' words for Oracle fulltext search?

I've a client testing the full text (example below) search on a new Oracle UCM site. The random text string they chose to test was 'test only'. Which failed; from my testing it seems 'only' is a reserved word, as it is never returned from a full text search (it is returned from metadata searches). I've spent the morning searching oracl...

How to remove list of words from a list of strings

Sorry if the question is bit confusing. This is similar to this question I think this the above question is close to what I want, but in Clojure. There is another question I need something like this but instead of '[br]' in that question, there is a list of strings that need to be searched and removed. Hope I made myself clear. I...

dismax feat. stopwords , synonyms ect..

Hello all ;) does dismax support all the features like the standard requestHandler ? stopwords ? synonymes ? stemming ? did you hear about "edismax" ? Im using Solr1.4 for my first tests of the stopwords , it doesn't work .. well i think so.. i configured my DisMax to match all terms if count terms=[1,2] Example ( in french ) ...

How can I sort an SQLite query ignoring articles ("the", "a", etc.)?

I'm using C# to display a list of movie titles that I am calling from an SQLite database. Currently, I'm using a custom ListBox class that has a function to sort the text stripping the word 'The' from the beginning of every item. However, it doesn't exactly seem to be the simplest way to do it, since it calls from the SQLite database and...

Stop-word elimination and stemmer in python

Hi, I have a somewhat large document and want to do stop-word elimination and stemming on the words of this document with "Python".Does anyone know an of the shelf package for these? If not a code which is fast enough for large documents is also welcome. Thanks ...

Is there a better way to remove stopwords in objective c?

The method I am doing right now is breaking the string into array of words in NSSet and minus the set of stopwords. Is there a more efficient way? ...

perl removing stop words from a big file

Hello everyone, I have a billion word corpus which I have collected in a scalar. I have a .regex file that contains all the stop words that I want to eliminate from my data (text). I dont know how to use this .regex file so I have made an array and stored all the stop words of the .regex file in my stop word array. To remove the stop...