views:

153

answers:

1

I've been investigating the appengine to see if I can use it for a project and while trying to choose between Python and Java, I ran into a surprising difference in datastore query performance: medium to large datastore queries are more than 3 times slower in Python than in Java.

My question is: is this performance difference for datastore queries (Python 3x slower than Java) normal, or am I doing something wrong in my Python code that's messing with the numbers?

My entity looks like this:

Person

firstname (length 8) lastname (length 8) address (20) city (10) state (2) zip (5)

I populate the datastore with 2000 Person records, with each field exactly the length noted here, all filled with random data and with no fields indexed (just so the inserts go faster).

I then query 1k Person records from Python (no filters, no ordering):

q = datastore.Query("Person")
objects = list(q.Get(1000))

And 1k Person records from Java (likewise no filters, no ordering):

DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
Query q = new Query("Person");
PreparedQuery pq = ds.prepare(q);
// Force the query to run and return objects so we can be sure
// we've timed a full query.
List<Entity> entityList = new ArrayList<Entity>(pq.asList(withLimit(1000)));

With this code, the Java code returns results in ~200ms; the Python code takes much longer, averaging >700ms. Both apps are on the same app id (with different versions), so they use the same datastore and should be on a level playing field.

All my code is available here, in case I've missed any details:

http://github.com/greensnark/appenginedatastoretest

+3  A: 

This would be an expected difference between Python and Java. Most likely you aren't seeing differences in the amount of time to make the query, but the amount of time it takes to parse the result and fill the receiving data structure.

You can test this by comparing the time it takes to query a single record. Remember that you'll need to test several times and average the total to get a true benchmark to account for possible fluctuations in latency on the backend.

In general, you can expect a compiled statically typed language like Java or Scala to always be faster than an interpreted language dynamically typed language like Ruby or Python.

Joshua
Both Python and Java are compiled to byte codes which are interpreted on a virtual machine. Python and Ruby are not equivalent in that regard.
Adam Crossland
You're right, it looks like all the slowdown for the Python code happens when decoding the returned protocol buffer data into Entity objects in the SDK's datastore.py. Small datastore queries (10 objects) show no perceptible performance difference between Java and Python.
greensnark
@Joshua, the generalization you make at the end of your post is not quite accurate in general, as adam pointed out. In addition, one needs to consider the specific context of app engine, where java apps with low traffic have to pay the startup cost of initializing the entire jvm quite frequently.
Peter Recore
From the Python documentation "Python is an interpreted language, as opposed to a compiled one, though the distinction can be blurry because of the presence of the bytecode compiler. This means that source files can be run directly without explicitly creating an executable which is then run."
Joshua
This article explains the differences well. http://www.razorvine.net/python/PythonComparedToJava
Joshua
really we should talk about specific runtimes being interpreters or compilers rather than languages. most languages can be run in more than one manner, which blurs the compiled/interpreted issue even further.
Peter Recore