I want to learn more about the full-text search. Recommend please a few good books in which would be well described used algorithms and data structures, which must describe how to write simple search engine. I use C++.
Thanks!
...
Does anyone know how to detect whether an incoming request is from a search engine robot? Do HTML headers contain any specific info for that?
...
hello,
is there is any documentation or any sample program that how to use stemming in xapian Java binding...
...
I'm trying to figure out the context of an @enclosed@ variable in a .properties file, whether it refers to a variable in Java, UNIX, or something else. But try as I might I can't figure out how to search the internet for '@'. Google strips it from searches, even though its search suggestions will include it (and are currently more useful...
My HTML page :
http://www.faressoft.com
<meta name="description" content="فارس سوفت الاسم الرائد في عالم البرمجيات العربية" />
When I search for my site name "فارس سوفت" using google, the description contains my description and the error message of my contact form.
why ?
...
I'm working on a system that performs matching on large sets of records based on strings and numeric ranges, and date ranges. The String matches are mostly exact matches as far as I can tell, as opposed to less exact full text search type results that I understand lucene is generally designed for. Numeric precision is important as the da...
Hey,
From my employer I just got a list of requirements for a new search function for our websites. They're book publisher's websites, so that's the basic field we're operating in here. The data is stored in a Microsoft SQL 2005 database server (SP3) with fulltext enabled.
Now, the requirements state that the search can be done in thre...
The question is oriented towards a research that I have been conducting since 3 months. What are the social search engines available currently (if any)? What is the future of semantic + social(collective) intelligence collection/distribution engines?
...
I have a Django site that uses the localization middleware in combination with gettext and the trans/blocktrans template tags to show visitors different pages depending on the preferred language in their user agent string (which seems to be the standard way of doing things in Django).
This works great for supported languages (currently ...
does apache-nutch support sitemaps?
or how can i implement it myself? how can i use priority field, should it be multiplied to boost field?
...
Hi, we are trying to add searching to our web site and want this search function to search only a few things:
Files - Obviously we need to parse the text of PDF, PPT and DOC files in our case
User comments. Users will be available to comment on the stuff and we want to catch if some user is searching for that relative information
I w...
Can you use ExtractingRequestHandler and Tika with any of
the compressed file formats (zip, tar, gz, etc) to extract the content out for indexing?
I am sending solr the archived.tar file using curl. curl "
http://localhost:8983/solr/update/extract?literal.id=doc1&fmap.content=body_texts&commit=true"
-H 'Content-type:application/...
I'm running swish-e on a subdirectory htm/. On any search results page, the urls have the directory separator escaped, giving unusable links like htm%2Fpage.htm.
How do I get swish-e or search.cgi to not escape the urls in the results? I'm using the standard search.tt in search.cgi that came with the distribution, and my swish.config fi...
It would help if I could do a search log analysis for my research. Is it possible to use a search API (Google, Yahoo, Bing) to create a log of web search queries over a specified time span, or is it available on request?
...
I come from a background in Ruby on Rails. Implementing search is relatively trivia using some of the excellent search plugins available to that community (i.e., Sphinx, Solr, etc.).
In .Net, what's a similar counter part the above strategies? I discovered DotLucene -- but, that project is now closed. Any others I should consider?
T...
Hi everyone. This is more of a theory question rather than practice. I'm working on a project which is quite a simple catalog of links. The whole model is similar to the Dmoz or Yahoo catalog, except that each entry has certain additional attributes.
I have hierarchical taxonomy working on all entries with many-to-many relationship, all...
I don't want the search engines to index my imprint page. How could I do that?
...
What search engine does twitter use for search.twitter.com
...
I want to create a simple search engine for learning purpose.
I want to know how to index a simple blog site.
A blog site has many pages and in every page there is a blogpost.
But in every page there are other stuff in common as well ( header, footer, category block and other stuff ).
In your opinion, How can I index this blog ?
The ...