search-engine

PHP search engine problem

Im using Sphider as a search engine for my website, its really easy to work with but im having some major issues with localized characters. All of my html/php pages have the charset defined as UTF-8 and the search and result page from Sphider had charset=ISO-8859-1, when I first used the Sphider "spider" to crawl my website it made all ...

Search engine for CPAN modules

I find the extensive volume of modules available through CPAN to be somewhat at odds with its search capacities. I'm aware that there is a lot of data stored about modules, including the dlsip tags. However I'm not aware of a convenient interface to query this database. search.cpan.org seems to provide only a basic textual search, and...

badoo.com user search - how can this be done?

Badoo.com has 56.000.000 user profiles. Profiles can be searched by sex, age, hair color, zodiac, education and so on, plus distance from my hometown, online status and date of registration. So far, this seems doable even if it's quite some query on huge tables (56m members...), it can be cached in a general way. The interesting part is...

Easiest way to implement a search engine over a set of URLs

I'm trying to implement a simple search engine over a list of HTML documents. I've build a script that generates the needed list of links. There is no need for crawling any other documents. So far I've tried Solr / Nutch (I'm still trying to get them working...), but they feel way overkill for such a simple task. I'm looking for somethi...

Recommendable Maven repository search engines?

mavensearch.net doesn't know current versions in many cases, mvnrepository.com is a bit more up to date but doesn't show repositories from where a package can be downloaded, what I would find very useful. What Maven respository search engines do you use and like? ...

redirects and search engines

Hi, in my web application I am using secure cookies to remember users. It works like this: If the connection is insecure redirect to a HTTPS url. Over the secure connection transmit the cookies and identify the user. Redirect to the original (insecure) url. So every client has to go through two redirects per session. Also the SSL ce...

Database for a search engine

I am thinking of developing a search engine, but have no idea about the backend that i could use efficiently. Please suggest me a database in which i can store thousands of records and query them in a time efficient manner. I am developing search engine for my own interest so, please dont give me any critics thoughts as usually found in ...

How can I track incoming search keywords

Does anyone know how I could track what search terms people are using to arrive at my site. For instance, someone searchs google for 'giant inflatable house' and clicks through to my site. I want to be able to capture those keywords and which search engine they came from. ...

Trying to Search my Blog-Site or Search This site gives no results

I have a Howto company Blog site that i post to for my clients to access for help. For some reason it has stopped letting anyone search on it. I can search for Mysites or users. But when you drop down the tab to search: This Site: "blog site name" you get the following reply: No results matching your search were found. Check your spel...

Possible site hacking problem.

My site is opening by entering URL but not opening from Google results. Has the site been hacked? Or any fault from Google ...

How do i exclude everything but text/hmtl from a heritrix crawl?

On: Heritrix Usecases there is an Use Case for "Only Store Successful HTML Pages" My Problem: i dont know how to implement it in my cxml File. Especially: Adding the ContentTypeRegExpFilter to the ARCWriterProcessor => set its regexp setting to text/html.*. ... There is no ContentTypeRegExpFilter in the sample cxml Files. ...

How do I tell search engines not to index content via secondary domain names?

I have a website at a.com (for example). I also have a couple of other domain names which I am not using for anything: b.com and c.com. They currently forward to a.com. I have noticed that Google is indexing content from my site using b.com/stuff and c.com/stuff, not just a.com/stuff. What is the proper way to tell Google to only index c...

How to make HTML page a Search Engine?

I am developing an HTML page that I want to convert in Search Engine. I just have put a Textbox & Search Button. I am just allowed to use JavaScript. How do I convert it in Search Engine ? ...

Google crawler finds robots.txt, but can't download it

Can anyone tell me what's wrong with this robots.txt? http://bizup.cloudapp.net/robots.txt The following is the error I get in Google Webmaster Tools: Sitemap errors and warnings Line Status Details Errors - Network unreachable: robots.txt unreachable We were unable to crawl your Sitemap because we found a robots.txt file at t...

How do you find out what keywords your website is ranking for?

How do you find out what keywords your website is ranking for? ...

Is there any tutorial on how to create URL submitter script with PHP?

I always been wondering how those URL submitter works they usually submit your URL to many search engines in a very short time, can anyone please provide a tutorial about it or link. thanks ...

Generate tags base on sentence given. this is for search engine C#

Hi, I'm creating a search engine on one of my project using lucene & asp.net mvc c#. I just wanted to implement auto tagging when the user enter a sentence. Is there an opensource API that can handle this? Example, user enter this sentence: "We offer proofreading services & outsourcing." The API then generates tags like: "proofreadin...

Robots.txt in ASP.NET MVC

Hi Guys, I am trying to figure out what to add to my robots.txt file ? Specifically, what does the command Allow: /$ do in the robots.txt file ? Edit: Also, how to allow a site to any have its /index page indexed when using ASP.NET MVC ? ...

How is can css be used in SEO?

At first I thought css was used for absolutely nothing but styling the document when the user viewed it in a browser. But then I realized that css is also used by search engines in indexing pages. Search engines don't index content with display: hidden I believe, and heavily penalize sites that use keyword stacking (text that is never se...

Do you get penalized by search engines when you let search engine crawlers pass through but add an aditional step for users?

I am working currently for a project on which several parts of the website may be restricted due to an area the user resides. So that when a user accesses the page he gets redirected to a form he must complete in order the view the content. Wanting search engines to index the content, I am creating exceptions for the search engine crawl...