full-text-indexing

sql server 2005 full text index query to help find noise words in content

Is there a way to query a full text index to help determine additional noise words? I would like to add some custom noise words and wondered if theres a way to analyse the index to help determine suggestions. ...

SQL Server CONTAINS with digits gives no results

Hi, I have a database table which is full-text indexed and i use the CONTAINS-function to perform a search-query on it. When I do: SELECT * FROM Plants WHERE CONTAINS(Plants.Description, '"Plant*" AND "one*"'); I get back all correct results matching a description with the words "Plant" and "one". Some plant are named like "Plant 1...

Save a binary file in SQL Server as BLOB and text (or get the text from Full-Text index)

Currently we are saving files (PDF, DOC) into the database as BLOB fields. I would like to be able to retrieve the raw text of the file to be able to manipulate it for hit-highlighting and other functions. Does anyone know of a simple way to either parse out the files and save the raw text on save, either via SQL or .net code. I have f...

Couple o' quick questions on Apache Lucene

-- I don't want to start any religious wars, but a quick google search indicates that Apache Lucene is the preferred open source tool for indexing and searching. Are there others? -- What file format does Lucene use to store its index file(s)? Thank is advance. Doug ...

Indexing CSV file contents in Python

Hi, I have a very large CSV file contaning only two fields (id,url). I want to do some indexing on the url field with python, I know that there are some tools like Whoosh or Pylucene. but I can't get the examples to work. can someone help me with this? ...

How to setup Lucene/Solr for a B2B web app?

Given: 1 database per client (business customer) 5000 clients Clients have between 2 to 2000 users (avg is ~100 users/client) 100k to 10 million records per database Users need to search those records often (it's the best way to navigate their data) Possibly relevant info: Several new clients each week (any time during business h...

Having trouble using 'AND' in CONTAINSTABLE SQL SERVER FULL TEXT SEARCH

I've been using FULL-TEXT for awhile but I cannot seem to get the most relevant results sometimes. If I have an field with something like "An Overview of Pain Medicine 5/12/2006" and a user types "An Overview 5/12/2006" So we create a search like: '"An" AND "Overview" AND "5/12/2006"' - 0 results (bad) '"Overview" AND "5/12/2006"' - 1 ...

Exception when indexing text documents with Lucene, using SnowballAnalyzer for cleaning up

Hello!!! I am indexing the documents with Lucene and am trying to apply the SnowballAnalyzer for punctuation and stopword removal from text .. I keep getting the following error :( IllegalAccessError: tried to access method org.apache.lucene.analysis.Tokenizer.(Ljava/io/Reader;)V from class org.apache.lucene.analysis.snowball.Snowba...

SQL Server Fulltext search yields no results

I have SQL Server 2005 Express Edition with Advanced Services. I enabled FullText and created a catalog as follows: create FullText catalog MyDatabase_FT in path 'mypath' as default I then created a FullText index as follows: create FullText index on Cell (CellName) key index PK_Cell with CHANGE_TRACKING AUTO I executed the fol...

How can I restore a SQL Server database using SMO with full text catalogues?

Hi, I want to programatically (c#) restore a database (SQL Server 2008) which was originally from my live environment. My local environment is obviously quite different, with different drive mappings. I have followed this article (http://www.mssqltips.com/tip.asp?tip=1849) which has been a great help, but I am struggling to find out h...

SQL SERVER FULL-TEXT INDEX, CONTAINS return empty

Hi, All: I got a issue about full index, any body can help me on this? 1) set up full text index CREATE FULLTEXT INDEX ON dbo.Companies(my table name) ( CompanyName(colum of my table) Language 0X0 ) KEY INDEX IX_Companies_CompanyAlias ON QuestionsDB WITH CHANGE_TRACKING AUTO GO 2) Using CONTAINS to find the matched rows SELECT Co...

Integrate Lucene or any other search product with SQL Server 2005

Hi, I need to use full text search with SQL Server 2005 and I have explored its inbuilt search approach (SQL Server full text indexing) but it seems less powerful. I have also looked features of Lucene. Now my questions: Is is possible to integrate Lucene and SQL server in anyway? Can my T-SQL queries use Lucene index for returning ...

Adding more OR searches with CONTAINS Brings Query to Crawl

I have a simple query that relies on two full-text indexed tables, but it runs extremely slow when I have the CONTAINS combined with any additional OR search. As seen in the execution plan, the two full text searches crush the performance. If I query with just 1 of the CONTAINS, or neither, the query is sub-second, but the moment you a...

Tell me SQL Server Full-Text searcher is crazy, not me.

i have some customers with a particular address that the user is searching for: 123 generic way There are 5 rows in the database that match: ResidentialAddress1 ============================= 123 GENERIC WAY 123 GENERIC WAY 123 GENERIC WAY 123 GENERIC WAY 123 GENERIC WAY i run a FT query to look for these rows. i'll show you ea...

FULLTEXT search(Mysql) is slow for the first time and then from second time onwards it gets much faster.

Hi, I have a table with 4000 records(Which is much easier to handle through full text search). when the search query is executed for the first time it is much slower. It takes about 5 to 10 seconds. Then it gets faster. If the site remains inactive for 10 or 15 minutes, and when I try to execute the query again it gets slower. I am usi...

What is the difference between EdgeNGramTokenizerFactory EdgeNGramFilterFactory in SOLR?

What is the difference between these two filters? They seem to have the same effect? Can anyone supply an example of how they are applied to some text? Thanks ...

MySQL Full Text Query Locking Table

Every once in a while we get a particularly long running full text query in MySQL. The query will run for a very long time, currently I'm seeing one that's running for 50,000 seconds (and still going). Using Kill, or Kill Query on the query seems to do nothing. Also, the command Timeout on the client side is 30 seconds, so the client ...

How to get frequently occuring phrases with Lucene

HI!! as the question says, I would like to get some frequently occuring phrases with lucene. I am getting some information from txt files, and am losing a lot of context for not having information for phrases eg. "information retrieval" is indexed as two separate words. What is the way to get the phrases like this? I can not find anyth...

Full Text indexing Multiple languages

Hi all, My DB stores content in three languages (English,French & Arabic) I have full text indexing enabled for few tables and would like to know few best practices: 1. When show I us language neutral indexing? 2. Can I Index Arabic? I don't see Arabic in the Indexable language! 3. should I have separate indexes for each language? (Each...

How does MyISAM scale compared to Solr for Django searching?

Imagine you have a web application written in Django and Python 2.65, and MySQL 5.1 is your database of choice. Now, imagine you will need to scale your app to handle searching 100's of thousands of document and potentially 100's of thousands of users will be using it. Reality: Haystack 1.0 with PySolr and Solr 1.4.0 is proving slow i...