Hi,
I was comparing an old PHP script of mine versus the newer, fancier Django version and the PHP one, with full spitting out of HTML and all was functioning faster. MUCH faster to the point that something has to be wrong on the Django one.
First, some context: I have a page that spits out reports of sales data. The data can be filtered by a number of things but is mostly filtered by date. This makes it a bit hard to cache it as the possibilities for results is nearly endless. There are a lot of numbers and calculations done but it was never much of a problem to handle within PHP.
UPDATES:
After some additional testing there is nothing within my view that is causing the slowdown. If I am simply number-crunching the data and spitting out 5 rows of rendered HTML, it's not that slow (still slower than PHP), but if I am rendering a lot of data, it's VERY slow.
Whenever I ran a large report (e.g. all sales for the year), the CPU usage of the machine goes to 100%. Don't know if this means much. I am using mod_python and Apache. Perhaps switching to WSGI may help?
My template tags that show the subtotals/totals process anywhere from 0.1 seconds to 1 second for really large sets. I call them about 6 times within the report so they don't seem like the biggest issue.
Now, I ran a Python profiler and came back with these results:
Ordered by: internal time List reduced from 3074 to 20 due to restriction ncalls tottime percall cumtime percall filename:lineno(function) 2939417 26.290 0.000 44.857 0.000 /usr/lib/python2.5/tokenize.py:212(generate_tokens) 2822655 17.049 0.000 17.049 0.000 {built-in method match} 1689928 15.418 0.000 23.297 0.000 /usr/lib/python2.5/decimal.py:515(__new__) 12289605 11.464 0.000 11.464 0.000 {isinstance} 882618 9.614 0.000 25.518 0.000 /usr/lib/python2.5/decimal.py:1447(_fix) 17393 8.742 0.001 60.798 0.003 /usr/lib/python2.5/tokenize.py:158(tokenize_loop) 11 7.886 0.717 7.886 0.717 {method 'accept' of '_socket.socket' objects} 365577 7.854 0.000 30.233 0.000 /usr/lib/python2.5/decimal.py:954(__add__) 2922024 7.199 0.000 7.199 0.000 /usr/lib/python2.5/inspect.py:571(tokeneater) 438750 5.868 0.000 31.033 0.000 /usr/lib/python2.5/decimal.py:1064(__mul__) 60799 5.666 0.000 9.377 0.000 /usr/lib/python2.5/site-packages/django/db/models/base.py:241(__init__) 17393 4.734 0.000 4.734 0.000 {method 'query' of '_mysql.connection' objects} 1124348 4.631 0.000 8.469 0.000 /usr/lib/python2.5/site-packages/django/utils/encoding.py:44(force_unicode) 219076 4.139 0.000 156.618 0.001 /usr/lib/python2.5/site-packages/django/template/__init__.py:700(_resolve_lookup) 1074478 3.690 0.000 11.096 0.000 /usr/lib/python2.5/decimal.py:5065(_convert_other) 2973281 3.424 0.000 3.424 0.000 /usr/lib/python2.5/decimal.py:718(__nonzero__) 759014 2.962 0.000 3.371 0.000 /usr/lib/python2.5/decimal.py:4675(__init__) 381756 2.806 0.000 128.447 0.000 /usr/lib/python2.5/site-packages/django/db/models/fields/related.py:231(__get__) 842130 2.764 0.000 3.557 0.000 /usr/lib/python2.5/decimal.py:3339(_dec_from_triple)
tokenize.py comes out on top, which can make some sense as I am doing a lot of number formatting. Decimal.py makes sense since the report is essentially 90% numbers. I have no clue what the built-in method match
is as I am not doing any Regex or similar in my own code (Something Django is doing?) The closest thing is I am using itertools ifilter.
It seems those are the main culprits and if I could figure out how to reduce the processing time of those then I would have a much much faster page.
Does anyone have any suggestions on how I could start on reducing this? I don't really know how I would fix this the tokenize/decimal issues without simply removing them.
Update: I ran some tests with/without filters on most of the data and the result times pretty much came back the same, the latter being a bit faster but not much to be the cause of the issue. What is exactly going on in tokenize.py?