Don't waste time profiling. The time is always in the database operations. Do as few as possible. Just the minimum number of inserts.
Three Things.
One. Don't SELECT over and over again to conform the Date, Hostname and Person dimensions. Fetch all the data ONCE into a Python dictionary and use it in memory. Don't do repeated singleton selects. Use Python.
Two. Don't Update.
Specifically, Do not do this. It's bad code for two reasons.
cursor.execute("UPDATE people SET chats_count = chats_count + 1 WHERE id = '%s'" % person_id)
It be replaced with a simple SELECT COUNT(*) FROM ... . Never update to increment a count. Just count the rows that are there with a SELECT statement. [If you can't do this with a simple SELECT COUNT or SELECT COUNT(DISTINCT), you're missing some data -- your data model should always provide correct complete counts. Never update.]
And. Never build SQL using string substitution. Completely dumb.
If, for some reason the SELECT COUNT(*)
isn't fast enough (benchmark first, before doing anything lame) you can cache the result of the count in another table. AFTER all of the loads. Do a SELECT COUNT(*) FROM whatever GROUP BY whatever
and insert this into a table of counts. Don't Update. Ever.
Three. Use Bind Variables. Always.
cursor.execute( "INSERT INTO ... VALUES( :x, :y, :z )", {'x':person_id, 'y':time_to_string(time), 'z':channel,} )
The SQL never changes. The values bound in change, but the SQL never changes. This is MUCH faster. Never build SQL statements dynamically. Never.