views:

55

answers:

2

Imagine you have a web application written in Django and Python 2.65, and MySQL 5.1 is your database of choice.

Now, imagine you will need to scale your app to handle searching 100's of thousands of document and potentially 100's of thousands of users will be using it.

Reality: Haystack 1.0 with PySolr and Solr 1.4.0 is proving slow in the above scenarios. Is MyISAM a more workable alternative or should I spend more time working with my current configuration using Solr in a "smarter" way?

Suggestions? Tips?

Thank you for any help! Michaux

+1  A: 

I have no expertise with Haystack or PySolr but just looking at Solr makes me think that MySQL might be a better choice. I know that MySQL can scale to very large applications if it is setup correctly.

Apache Solr just on Tomcat. Tomcat can be a bit of a resource hog and can run slowly. MySQL runs from compiled binaries. This should provide a bit of a boost. The server that you run this off of will also make a large difference. I would say that if you have the ability go ahead and try and setup the MySQL system, and see if you get any difference.

Badger
Not that it makes a lot of difference here, but Solr runs on several other servlet containers: Jetty, Resin, WebSphere, ... http://wiki.apache.org/solr/SolrInstall#Servlet_Container_or_Environment_Specific_Tips
Mauricio Scheffer
A: 

I assume you're talking about comparing Solr vs MySQL full-text search, otherwise it would be comparing apples to oranges.

I don't know about Haystack or PySolr, but Solr itself should have no problems handling documents in the order of 100000 with lots of users. Those two parameters alone are not enough to spec the problem, though. For example, frequency of updates, real frequency of requests, document size, page size, sorting, faceting, etc.

Solr is easily scalable, both vertically and horizontally, and is Apache-licensed, while the horizontal scaling solution for MySQL is GPL+commercially licensed.

I disagree with Badger's answer about Tomcat, it's a very polished, proven, stable server that's been around for over 10 years, and the Java performance myth has to be abolished once and for all.

Bottom line: it's very likely that you need to optimize your Solr instance (both the client-side querying and the server-side index and performance settings). Solr powers some of the biggest websites so it's quite likely that it can handle your load as well.

Mauricio Scheffer
Thank you, that's just the sort of answer I needed to move forward with more research on optimizing Solr!
mkelley33
-1 Since when does linking to Google search terms and generic Wikipedia articles constitute a solid argument?
prometheus
@prometheus: are you arguing about Tomcat's history (the only wikipedia link) or the Java performance myth (which has been debunked so many times I don't want to point to any particular article)?
Mauricio Scheffer
It's hard to argue with your "argument" when you just put up Wikipedia articles and google search terms instead of actually laying out arguments as your 5 paragraphs at first glance suggested. So do you have any actual facts to refute the claim that "Tomcat can be a bit of a resource hog" (which my real-world experience actually supports), and that in the particular case of the original question Java/Tomcat/Solr performance won't be an issue? Because I can't find said facts in any of those paragraphs. Unless you count google search terms as facts.
prometheus
Let me cut to the chase: Which large-scale Tomcat installations have you done so far? Have you done Java profiling on your installation? Have you compared your Tomcat deployment to other specs not involving Tomcat? Which specific performance optimizations do you suggest for Solr? I can't find any of those facts in your answer, so up to this point I have to ponder that you are not actually "eating your own dog food", as they say.
prometheus
@prometheus: again, there is only **one** Wikipedia article (linking to Tomcat's history) and **one** google link (which points to several articles debunking the Java performance myth).
Mauricio Scheffer
@prometheus: About Tomcat: first off, the OP never mentioned Tomcat. I **do** use Solr+Tomcat and I really don't appreciate you calling me a liar. I wouldn't call it "large-scale" though but it serves 30M+ queries/month, no perf issues. Yes, I did some unscientific profiling of the JVM until I reached acceptable results. Yes I do also use Jetty as dev server. Solr optimizations: search "performance settings" in my answer.
Mauricio Scheffer
@prometheus: if you have further questions about Java, Tomcat or Solr optimizations I recommend creating new questions here on stackoverflow or on serverfault.com
Mauricio Scheffer
I never called you a liar. I simply wanted to point out that, obviously, your answer was not based on real-world experience with Django+Solr deployments, but on guess-work and marketing spin - deliberate or not.
prometheus
@prometheus: you said I was not eating my own dog food. I replied that I do use Solr and Tomcat. I already said I don't use Django, but I do have extensive experience with Solr, on real-world deployments. Take a look at my answers on the Solr tag and my blog. I wrote a Solr interface for .Net which is used successfully by several websites. You say Tomcat is a hog so you seem to imply that all these companies that chose Tomcat (http://wiki.apache.org/tomcat/PoweredBy) don't know what they're doing. That's pretty arrogant of you, and bordering on trolling.
Mauricio Scheffer
@prometheus: this discussion isn't leading anywhere, I'm not wasting any more time on this.
Mauricio Scheffer
No, what I was saying is that in light of the specific question - Does Solr scale with Django? - neither you nor anyone else came forth with a solid argument that this were the case. It's absolutely irrelevant to the question whether or not Tomcat itself scales, or which websites use Tomcat. It's not an answer to the question, and it's misleading visitors to SO to imply that Tomcat were scaling with Django when there is no evidence for that claim.
prometheus