search-engine

In-memory search index for application takes up too much memory - any suggestions?

In our desktop application, we have implemented a simple search engine using an inverted index. Unfortunately, some of our users' datasets can get very large, e.g. taking up ~1GB of memory before the inverted index has been created. The inverted index itself takes up a lot of memory, almost as much as the data being indexed (another 1G...

How Search Engine Bots Crawl Forums?

If I have a forums site with a large number of threads, will the search engine bot crawl the whole site every time? Say I have over 1,000,000 threads in my site, will they get crawled every time the bot crawls my site? or how does it work? I want my website to be indexed but I don't want the bot to kill my website! In other words I don't...

Howto add dynamic search parameters to Sharepoint search?

So our scenario is this: We have multiple Sharepoint sites that are created dynamically on a "as requested" basis. Basically there's a new site for each new project. Now, for every site we want to add a search clause that says that only contents with a metadata tag value equal to the sitename should be found. Quick example: There are 2 s...

Upgrading a site with SEO in mind

I'm managing an established site which is currently in the process of being upgraded (completely replaced anew), but I'm worried that I'll lose all my Google indexing (that is, there will be a lot of pages in Google's index which won't exist in that place any more). The last time I upgraded a (different) site, someone told me I should h...

Google (Search Engine) Indexing advice for asp.net pages

I am working on a course leaflet system for the college I work at, leaflets are stored in a database with primary key course_code. I would like the leaflets ideally to get indexed by google how would I achieve this assuming i develop the system in asp.net 2.0. I understand part of getting it indexed is to pass the variables around in t...

Search engines and browser accept-language

I'm building a web portal where language content will generally depend on the "accept-language" sent by the browser. The same content-URI will thus serve different content to different users depending on their browser setting. I'm very curious to know how this will affect search indexing. Does Google index using all languages, and is it...

How to notify search engines that my site is down for some time?

My site will be down for next few days. Is there any way to so that search engines knows about this and don't do any -ve action towards reputation and pagerank of website. ...

Api for indexing and hashing

Hi guys, I am trying to build a prototype of search engine. Can any one please suggest me C++ APIs for indexing and retrieving the data? Thanks ...

Do search engines respect the HTTP header field “Content-Location”?

I was wondering if search engines respect the HTTP header field Content-Location. This could be useful e.g. when you want to remove the session id argument out of the URL: GET /foo/bar?sid=0123456789 HTTP/1.1 Host: example.com … HTTP/1.1 200 OK Content-Location: http://example.com/foo/bar … Clarification: I dont’t want to redirect...

How to access keywords for files generated by desktop search engines like Windows Search or Copernic Desktop Search

Dear Folks, I am trying order the files on a common fileshare of my department, containing thousands of documents of various filetypes. My idea was to sort them by content-related keywords. Only few files contain valid info in the keywords file attribute provided by Windows. My idea was to let some desktop search engine index the files ...

Ready made insite search for asp.net website

Hi, I have developed a business index which combines ecommerce websites.(in asp.net2.0+c#) I'm looking for an in-site search engine that already handles issues like indexing, speed and quality. Are there any famous solutions doing such? I need the search results to be customized on my design, so google search engine isn't an option. ...

What do I do if a search engine spider is hammering my site?

I run a small webserver, and lately it's been getting creamed by a search engine spider. What's the proper way to cool it down? Should I send it 5xx responses periodically? Is there a robots.txt setting I should be using? Or something else? ...

What are the best practices for multilanguage sites?

I want to make a multi-language site, such that all or almost all pages will be available in 2 or more translations. What are the best practices to follow? For example, I consider these language selection mechanisms: Cookie-based selection of the preferred language. Based on Accept-Language header if the cookie is not set. Based on Ge...

Google-like Search Engine in PHP/mySQL

We have OCRed thousands of pages of newspaper articles. The newspaper, issue, date, page number and OCRed text of each page has been put into a mySQL database. We now want to build a Google-like search engine in PHP to find the pages given a query. It's got to be fast, and take no more than a second for any search. How should we do i...

Recommend a linux-based Site Search Engine?

I need a site search engine to provide search for my members-only content. I've previously used Fluid Dynamics Search Engine but was wondering if there was anything that's been more recently updated. Needs to index content via site crawling as opposed to filesystem crawling as all content is in a database. Also needs to run under FreeBS...

Search engine optimization - Developer guidance?

I've just picked up a contract to sort out a vipers-nest of e-commerce websites that a previous 'developer' left for one of my clients. There's about a couple of dozen of them using a custom shopping cart and CMS system that's too embedded to dump and works well enough, but desperately needs cleaning up, re-factoring, and bug fixing, so...

How relevant are url names in non english speaking countries?

If I have a commercial site belonging to a Japanese company which will use Katakana or Kanji (non ASCII characters) for the keyword they wish to obtain good search results in google, does it still matter to put the closest english word on the site DNS Name? like: if the search word is "homepage" in Katakana: ホームページ Will the the DNS n...

How to force a page to be removed from the search engine index?

Situation: Google has indexed a page in a forum. The thread is now deleted. How/whether can I make Google and other search engines to delete the cached copy? I doubt they would have anything against that since the linked page does not exist anymore and keeping the index updated and valid should be in their best interests. Is this possib...

Microsoft Search Server

How do you retrieve the most popular search terms programmatically from Microsoft Search Server. I have searched and not been able to come up with an answer. ...

Hosted Projects Meta Search Engine

With the myriad of sites available like sourceforge, github, berlios, rubyforge and many others for hosting open-source projects, I've been wondering if there is a specialised search engine out there that catalogues all the projects available on these different sites. I'm not talking about a search engine to search actual source code li...