I am attempting to profile a long running python script. The script does some spatial analysis on raster GIS data set using the gdal module. The script currently uses three files, the main script which loops over the raster pixels called find_pixel_pairs.py
, a simple cache in lrucache.py
and some misc classes in utils.py
. I have profiled the code on a moderate sized dataset. pstats
returns:
p.sort_stats('cumulative').print_stats(20)
Thu May 6 19:16:50 2010 phes.profile
355483738 function calls in 11644.421 CPU seconds
Ordered by: cumulative time
List reduced from 86 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.008 0.008 11644.421 11644.421 <string>:1(<module>)
1 11064.926 11064.926 11644.413 11644.413 find_pixel_pairs.py:49(phes)
340135349 544.143 0.000 572.481 0.000 utils.py:173(extent_iterator)
8831020 18.492 0.000 18.492 0.000 {range}
231922 3.414 0.000 8.128 0.000 utils.py:152(get_block_in_bands)
142739 1.303 0.000 4.173 0.000 utils.py:97(search_extent_rect)
745181 1.936 0.000 2.500 0.000 find_pixel_pairs.py:40(is_no_data)
285478 1.801 0.000 2.271 0.000 utils.py:98(intify)
231922 1.198 0.000 2.013 0.000 utils.py:116(block_to_pixel_extent)
695766 1.990 0.000 1.990 0.000 lrucache.py:42(get)
1213166 1.265 0.000 1.265 0.000 {min}
1031737 1.034 0.000 1.034 0.000 {isinstance}
142740 0.563 0.000 0.909 0.000 utils.py:122(find_block_extent)
463844 0.611 0.000 0.611 0.000 utils.py:112(block_to_pixel_coord)
745274 0.565 0.000 0.565 0.000 {method 'append' of 'list' objects}
285478 0.346 0.000 0.346 0.000 {max}
285480 0.346 0.000 0.346 0.000 utils.py:109(pixel_coord_to_block_coord)
324 0.002 0.000 0.188 0.001 utils.py:27(__init__)
324 0.016 0.000 0.186 0.001 gdal.py:848(ReadAsArray)
1 0.000 0.000 0.160 0.160 utils.py:50(__init__)
The top two calls contain the main loop - the entire analyis. The remaining calls sum to less than 625 of the 11644 seconds. Where are the remaining 11,000 seconds spent? Is it all within the main loop of find_pixel_pairs.py
? If so, can I find out which lines of code are taking most of the time?