Im using Sphider as a search engine for my website, its really easy to work with but im having some major issues with localized characters.
All of my html/php pages have the charset defined as UTF-8 and the search and result page from Sphider had charset=ISO-8859-1, when I first used the Sphider "spider" to crawl my website it made all ...
I find the extensive volume of modules available through CPAN to be somewhat at odds with its search capacities. I'm aware that there is a lot of data stored about modules, including the dlsip tags. However I'm not aware of a convenient interface to query this database. search.cpan.org seems to provide only a basic textual search, and...
Badoo.com has 56.000.000 user profiles. Profiles can be searched by sex, age, hair color, zodiac, education and so on, plus distance from my hometown, online status and date of registration. So far, this seems doable even if it's quite some query on huge tables (56m members...), it can be cached in a general way.
The interesting part is...
I'm trying to implement a simple search engine over a list of HTML documents. I've build a script that generates the needed list of links. There is no need for crawling any other documents.
So far I've tried Solr / Nutch (I'm still trying to get them working...), but they feel way overkill for such a simple task. I'm looking for somethi...
mavensearch.net doesn't know current versions in many cases, mvnrepository.com is a bit more up to date but doesn't show repositories from where a package can be downloaded, what I would find very useful.
What Maven respository search engines do you use and like?
...
Hi, in my web application I am using secure cookies to remember users.
It works like this:
If the connection is insecure redirect to a HTTPS url.
Over the secure connection transmit the cookies and identify the user.
Redirect to the original (insecure) url.
So every client has to go through two redirects per session. Also the SSL ce...
I am thinking of developing a search engine, but have no idea about the backend that i could use efficiently. Please suggest me a database in which i can store thousands of records and query them in a time efficient manner. I am developing search engine for my own interest so, please dont give me any critics thoughts as usually found in ...
Does anyone know how I could track what search terms people are using to arrive at my site. For instance, someone searchs google for 'giant inflatable house' and clicks through to my site. I want to be able to capture those keywords and which search engine they came from.
...
I have a Howto company Blog site that i post to for my clients to access for help. For some reason it has stopped letting anyone search on it. I can search for Mysites or users.
But when you drop down the tab to search: This Site: "blog site name" you get the following reply:
No results matching your search were found. Check your spel...
My site is opening by entering URL but not opening from Google results.
Has the site been hacked? Or any fault from Google
...
On: Heritrix Usecases there is an Use Case for "Only Store Successful HTML Pages"
My Problem: i dont know how to implement it in my cxml File. Especially:
Adding the ContentTypeRegExpFilter to the ARCWriterProcessor => set its regexp setting to text/html.*. ...
There is no ContentTypeRegExpFilter in the sample cxml Files.
...
I have a website at a.com (for example). I also have a couple of other domain names which I am not using for anything: b.com and c.com. They currently forward to a.com. I have noticed that Google is indexing content from my site using b.com/stuff and c.com/stuff, not just a.com/stuff. What is the proper way to tell Google to only index c...
I am developing an HTML page that I want to convert in Search Engine.
I just have put a Textbox & Search Button. I am just allowed to use JavaScript. How do I convert it in Search Engine ?
...
Can anyone tell me what's wrong with this robots.txt?
http://bizup.cloudapp.net/robots.txt
The following is the error I get in Google Webmaster Tools:
Sitemap errors and warnings
Line Status Details
Errors -
Network unreachable: robots.txt unreachable
We were unable to crawl your Sitemap because we found a robots.txt file at t...
How do you find out what keywords your website is ranking for?
...
I always been wondering how those URL submitter works they usually submit your URL to many search engines in a very short time, can anyone please provide a tutorial about it or link. thanks
...
Hi,
I'm creating a search engine on one of my project using lucene & asp.net mvc c#.
I just wanted to implement auto tagging when the user enter a sentence. Is there an opensource API that can handle this?
Example, user enter this sentence:
"We offer proofreading services & outsourcing."
The API then generates tags like:
"proofreadin...
Hi Guys,
I am trying to figure out what to add to my robots.txt file ? Specifically, what does the command
Allow: /$
do in the robots.txt file ?
Edit: Also, how to allow a site to any have its /index page indexed when using ASP.NET MVC ?
...
At first I thought css was used for absolutely nothing but styling the document when the user viewed it in a browser. But then I realized that css is also used by search engines in indexing pages. Search engines don't index content with display: hidden I believe, and heavily penalize sites that use keyword stacking (text that is never se...
I am working currently for a project on which several parts of the website may be restricted due to an area the user resides. So that when a user accesses the page he gets redirected to a form he must complete in order the view the content.
Wanting search engines to index the content, I am creating exceptions for the search engine crawl...