In our desktop application, we have implemented a simple search engine using an inverted index.
Unfortunately, some of our users' datasets can get very large, e.g. taking up ~1GB of memory before the inverted index has been created. The inverted index itself takes up a lot of memory, almost as much as the data being indexed (another 1G...
If I have a forums site with a large number of threads, will the search engine bot crawl the whole site every time? Say I have over 1,000,000 threads in my site, will they get crawled every time the bot crawls my site? or how does it work? I want my website to be indexed but I don't want the bot to kill my website! In other words I don't...
So our scenario is this: We have multiple Sharepoint sites that are created dynamically on a "as requested" basis. Basically there's a new site for each new project. Now, for every site we want to add a search clause that says that only contents with a metadata tag value equal to the sitename should be found. Quick example:
There are 2 s...
I'm managing an established site which is currently in the process of being upgraded (completely replaced anew), but I'm worried that I'll lose all my Google indexing (that is, there will be a lot of pages in Google's index which won't exist in that place any more).
The last time I upgraded a (different) site, someone told me I should h...
I am working on a course leaflet system for the college I work at, leaflets are stored in a database with primary key course_code. I would like the leaflets ideally to get indexed by google how would I achieve this assuming i develop the system in asp.net 2.0.
I understand part of getting it indexed is to pass the variables around in t...
I'm building a web portal where language content will generally depend on the "accept-language" sent by the browser. The same content-URI will thus serve different content to different users depending on their browser setting.
I'm very curious to know how this will affect search indexing. Does Google index using all languages, and is it...
My site will be down for next few days. Is there any way to so that search engines knows about this and don't do any -ve action towards reputation and pagerank of website.
...
Hi guys,
I am trying to build a prototype of search engine. Can any one please suggest me C++ APIs for indexing and retrieving the data?
Thanks
...
I was wondering if search engines respect the HTTP header field Content-Location.
This could be useful e.g. when you want to remove the session id argument out of the URL:
GET /foo/bar?sid=0123456789 HTTP/1.1
Host: example.com
…
HTTP/1.1 200 OK
Content-Location: http://example.com/foo/bar
…
Clarification:
I dont’t want to redirect...
Dear Folks,
I am trying order the files on a common fileshare of my department, containing thousands of documents of various filetypes. My idea was to sort them by content-related keywords. Only few files contain valid info in the keywords file attribute provided by Windows. My idea was to let some desktop search engine index the files ...
Hi,
I have developed a business index which combines ecommerce websites.(in asp.net2.0+c#)
I'm looking for an in-site search engine that already handles issues like indexing, speed and quality.
Are there any famous solutions doing such?
I need the search results to be customized on my design, so google search engine isn't an option.
...
I run a small webserver, and lately it's been getting creamed by a search engine spider. What's the proper way to cool it down? Should I send it 5xx responses periodically? Is there a robots.txt setting I should be using? Or something else?
...
I want to make a multi-language site, such that all or almost all pages will be available in 2 or more translations. What are the best practices to follow?
For example, I consider these language selection mechanisms:
Cookie-based selection of the preferred language.
Based on Accept-Language header if the cookie is not set.
Based on Ge...
We have OCRed thousands of pages of newspaper articles. The newspaper, issue, date, page number and OCRed text of each page has been put into a mySQL database.
We now want to build a Google-like search engine in PHP to find the pages given a query. It's got to be fast, and take no more than a second for any search.
How should we do i...
I need a site search engine to provide search for my members-only content. I've previously used Fluid Dynamics Search Engine but was wondering if there was anything that's been more recently updated.
Needs to index content via site crawling as opposed to filesystem crawling as all content is in a database. Also needs to run under FreeBS...
I've just picked up a contract to sort out a vipers-nest of e-commerce websites that a previous 'developer' left for one of my clients. There's about a couple of dozen of them using a custom shopping cart and CMS system that's too embedded to dump and works well enough, but desperately needs cleaning up, re-factoring, and bug fixing, so...
If I have a commercial site belonging to a Japanese company which will use Katakana or Kanji (non ASCII characters) for the keyword they wish to obtain good search results in google, does it still matter to put the closest english word on the site DNS Name?
like:
if the search word is "homepage" in Katakana: ホームページ
Will the the DNS n...
Situation: Google has indexed a page in a forum. The thread is now deleted. How/whether can I make Google and other search engines to delete the cached copy? I doubt they would have anything against that since the linked page does not exist anymore and keeping the index updated and valid should be in their best interests.
Is this possib...
How do you retrieve the most popular search terms programmatically from Microsoft Search Server.
I have searched and not been able to come up with an answer.
...
With the myriad of sites available like sourceforge, github, berlios, rubyforge and many others for hosting open-source projects, I've been wondering if there is a specialised search engine out there that catalogues all the projects available on these different sites.
I'm not talking about a search engine to search actual source code li...