tags:

views:

156

answers:

7

I want to build a in-site search engine with php. Users must login to see the information. So I can't use the google or yahoo search engine code.

I want to make the engine searching for the text and pages, and not the tables in mysql database right now.

Has anyone ever done this? Could you give me some pointers to help me get started?

+3  A: 

you'll need a spider that harvests pages from your site (in a cron job, for example), strips html and saves them in a database

stereofrog
This is exactly what we doesn't want. To use databases. But I also looked at the 20% accept rate and thought... bah! :)
Frankie
@Frankie did you read the question? or only his accept rate? ;)
stereofrog
@Frankie, what do you recommend?
garcon1986
@garcon1986 a search engine will have to use databases of some sort to have any kind of speed. There are several that will do just fine, google Sphinx or Lucene like Arnaude suggested . Doing it on files just wont do the trick.
Frankie
A: 

If the content and the titles of your pages are already managed by a database, you will just need to write your search engine in php. There are plenty of solutions to query your database, for example:

http://www.webreference.com/programming/php/search/

If the content is just contained in html files and not in the db, you might want to write a spider, as suggested by stereofrog.

You may be interested in caching the results to improve the performances, too.

I would say that everything depends on the size and the complexity of your website/web application.

Roberto Aloi
A: 

You can cheat a little bit the way the much-hated Experts-Exchange web site does. They are for-profit programmer's Q&A site much like StackOverflow. In order to see answers you have to pay, but sometimes the answers come up in Google search results. It is rather clear that E-E present different page for web crawlers and different for humans. You could use the same trick, then add Google Custom Search to your site. Users who are logged in would then see the results, otherwise they'd be bounced to login screen.

Michał Rudnicki
To be clear, EE presents a different page to crawlers _and to visitors from google_.
Paul McMillan
that is called cloaking and I wouldn't recommend it. (even if EE does it, they have some special agreements with google about it)
dusoft
If that requires some special agreement - it's probably a no go then.
Michał Rudnicki
When the solution appears on Google you can scroll down the page - to the very bottom of all that crap - and the solution will be visible to the user. That goes along with Google's guidelines - no need for special agreements.
Frankie
+2  A: 

You might want to have a look at Sphinx http://sphinxsearch.com/ it is a search engine that can easily be access from php scripts.

Arnaud
A: 

Thanks all.

I just want to search information in all the pages, so i think i should make a spider.

Does anyone can provide an tutorial or some similar script ??

Any advice is appreciated!!

garcon1986
A: 

Do you have control over your server? Then i would recommend that you install Solr/Lucene for index and SolPHP for interacting with PHP. That way you can have facets and other nice full text search features.

I would not spider the actual pages, instead i would spider pages without navigation and other things that is not content related.

SOLR requiers Java on the server.

Fontanka16
Thanks, but i think it's more complex for me to use java. Do you have other recommendations?
garcon1986
It's not that diffircult and you will have a very powerful search engine. There is absolutely no java programming involved. The simpler aproach is the mysql way. Put upp views that you query.
Fontanka16
Thank you Fontanka, i will try to use SolPHP.
garcon1986
A: 

Hello everyone,

I have used sphider finally which is a free tool, and it works well with php.

Thanks all.

garcon1986