search-engine

The book about writing search engine. Recommend some...

I want to learn more about the full-text search. Recommend please a few good books in which would be well described used algorithms and data structures, which must describe how to write simple search engine. I use C++. Thanks! ...

How to know whether an incoming request is from a search engine robot?

Does anyone know how to detect whether an incoming request is from a search engine robot? Do HTML headers contain any specific info for that? ...

how to use stemmer in xapian java bindings..

hello, is there is any documentation or any sample program that how to use stemming in xapian Java binding... ...

How to Search the Internet for '@'

I'm trying to figure out the context of an @enclosed@ variable in a .properties file, whether it refers to a variable in Java, UNIX, or something else. But try as I might I can't figure out how to search the internet for '@'. Google strips it from searches, even though its search suggestions will include it (and are currently more useful...

Problem with description of my site and google search

My HTML page : http://www.faressoft.com <meta name="description" content="فارس سوفت الاسم الرائد في عالم البرمجيات العربية" /> When I search for my site name "فارس سوفت" using google, the description contains my description and the error message of my contact form. why ? ...

Calling search gurus: Numeric range search performance with Lucene?

I'm working on a system that performs matching on large sets of records based on strings and numeric ranges, and date ranges. The String matches are mostly exact matches as far as I can tell, as opposed to less exact full text search type results that I understand lucene is generally designed for. Numeric precision is important as the da...

Database search engine - Sort by relevance according to specific relevance rules

Hey, From my employer I just got a list of requirements for a new search function for our websites. They're book publisher's websites, so that's the basic field we're operating in here. The data is stored in a Microsoft SQL 2005 database server (SP3) with fulltext enabled. Now, the requirements state that the search can be done in thre...

Social Search Algorithms

The question is oriented towards a research that I have been conducting since 3 months. What are the social search engines available currently (if any)? What is the future of semantic + social(collective) intelligence collection/distribution engines? ...

How Do Search Engines See A Localized Django Site?

I have a Django site that uses the localization middleware in combination with gettext and the trans/blocktrans template tags to show visitors different pages depending on the preferred language in their user agent string (which seems to be the standard way of doing things in Django). This works great for supported languages (currently ...

nutch and sitemap.xml

does apache-nutch support sitemaps? or how can i implement it myself? how can i use priority field, should it be multiplied to boost field? ...

How to implement searching?

Hi, we are trying to add searching to our web site and want this search function to search only a few things: Files - Obviously we need to parse the text of PDF, PPT and DOC files in our case User comments. Users will be available to comment on the stuff and we want to catch if some user is searching for that relative information I w...

Using Solr CELL's ExtractingRequestHandler to index/extract files from package formats.

Can you use ExtractingRequestHandler and Tika with any of the compressed file formats (zip, tar, gz, etc) to extract the content out for indexing? I am sending solr the archived.tar file using curl. curl " http://localhost:8983/solr/update/extract?literal.id=doc1&amp;fmap.content=body_texts&amp;commit=true" -H 'Content-type:application/...

Problem: Swish-e escapes the / in links to a subdirectory?

I'm running swish-e on a subdirectory htm/. On any search results page, the urls have the directory separator escaped, giving unusable links like htm%2Fpage.htm. How do I get swish-e or search.cgi to not escape the urls in the results? I'm using the standard search.tt in search.cgi that came with the distribution, and my swish.config fi...

How to obtain a log of web search queries?

It would help if I could do a search log analysis for my research. Is it possible to use a search API (Google, Yahoo, Bing) to create a log of web search queries over a specified time span, or is it available on request? ...

Options for implementing search in .Net web projects?

I come from a background in Ruby on Rails. Implementing search is relatively trivia using some of the excellent search plugins available to that community (i.e., Sphinx, Solr, etc.). In .Net, what's a similar counter part the above strategies? I discovered DotLucene -- but, that project is now closed. Any others I should consider? T...

Searching Techniques Recommendations

Hi everyone. This is more of a theory question rather than practice. I'm working on a project which is quite a simple catalog of links. The whole model is similar to the Dmoz or Yahoo catalog, except that each entry has certain additional attributes. I have hierarchical taxonomy working on all entries with many-to-many relationship, all...

How to prevent search engines from indexing a single page of my website?

I don't want the search engines to index my imprint page. How could I do that? ...

Anyone know what twitter uses for their Search Engine?

What search engine does twitter use for search.twitter.com ...

How to index a Blog as a search engine ?

I want to create a simple search engine for learning purpose. I want to know how to index a simple blog site. A blog site has many pages and in every page there is a blogpost. But in every page there are other stuff in common as well ( header, footer, category block and other stuff ). In your opinion, How can I index this blog ? The ...