views:

1043

answers:

3

I am somewhat new to transactional databases and have come across an issue I am trying to understand.

I have created a simple demonstration where a database connection is stored inside each of the 5 threads created by cherrypy. I have a method that displays a table of timestamps stored in the database and a button to add a new record of time stamps.

the table has 2 fields, one for the datetime.datetime.now() timestamp passed by python and one for the database timestamp set to default NOW().


CREATE TABLE test (given_time timestamp,
                   default_time timestamp DEFAULT NOW());
I have 2 methods that interact with the database. The first will create a new cursor, insert a new given_timestamp, commit the cursor, and return to the index page. The second method will create a new cursor, select the 10 most recent timestamps and return those to the caller.

import sys
import datetime
import psycopg2
import cherrypy

def connect(thread_index): 
    # Create a connection and store it in the current thread 
    cherrypy.thread_data.db = psycopg2.connect('dbname=timestamps')

# Tell CherryPy to call "connect" for each thread, when it starts up
cherrypy.engine.subscribe('start_thread', connect)

class Root:
    @cherrypy.expose
    def index(self): 
        html = []
        html.append("<html><body>")

        html.append("<table border=1><thead>")
        html.append("<tr><td>Given Time</td><td>Default Time</td></tr>")
        html.append("</thead><tbody>")

        for given, default in self.get_timestamps():
            html.append("<tr><td>%s<td>%s" % (given, default))

        html.append("</tbody>")
        html.append("</table>")

        html.append("<form action='add_timestamp' method='post'>")
        html.append("<input type='submit' value='Add Timestamp'/>")
        html.append("</form>")

        html.append("</body></html>")
        return "\n".join(html)

    @cherrypy.expose
    def add_timestamp(self):
        c = cherrypy.thread_data.db.cursor()
        now = datetime.datetime.now()
        c.execute("insert into test (given_time) values ('%s')" % now)
        c.connection.commit()
        c.close()
        raise cherrypy.HTTPRedirect('/')

    def get_timestamps(self):
        c = cherrypy.thread_data.db.cursor()
        c.execute("select * from test order by given_time desc limit 10")
        records = c.fetchall()
        c.close()
        return records

if __name__ == '__main__':

    cherrypy.config.update({'server.socket_host': '0.0.0.0',
                            'server.socket_port': 8081,
                            'server.thread_pool': 5,
                            'tools.log_headers.on': False,
                            })

    cherrypy.quickstart(Root())
I would expect the given_time and default_time timestamps to be only a few microseconds off from each other. However I am getting some strange behavior. If I add timestamps every few seconds, the default_time is not a few microseconds off from the given_time, but is usually a few microseconds off from the *previous* given_time.
Given Time                  Default Time
2009-03-18 09:31:30.725017  2009-03-18 09:31:25.218871
2009-03-18 09:31:25.198022  2009-03-18 09:31:17.642010
2009-03-18 09:31:17.622439  2009-03-18 09:31:08.266720
2009-03-18 09:31:08.246084  2009-03-18 09:31:01.970120
2009-03-18 09:31:01.950780  2009-03-18 09:30:53.571090
2009-03-18 09:30:53.550952  2009-03-18 09:30:47.260795
2009-03-18 09:30:47.239150  2009-03-18 09:30:41.177318
2009-03-18 09:30:41.151950  2009-03-18 09:30:36.005037
2009-03-18 09:30:35.983541  2009-03-18 09:30:31.666679
2009-03-18 09:30:31.649717  2009-03-18 09:30:28.319693

Yet, if I add a new timestamp about once a minute, both the given_time and default_time are only a few microseconds off as expected. However, after submitting the 6th timestamp (the number of threads + 1) the default_time is a few microseconds off from the first given_time timestamp.

Given Time                  Default Time
2009-03-18 09:38:15.906788  2009-03-18 09:33:58.839075
2009-03-18 09:37:19.520227  2009-03-18 09:37:19.520293
2009-03-18 09:36:04.744987  2009-03-18 09:36:04.745039
2009-03-18 09:35:05.958962  2009-03-18 09:35:05.959053
2009-03-18 09:34:10.961227  2009-03-18 09:34:10.961298
2009-03-18 09:33:58.822138  2009-03-18 09:33:55.423485

Even though I am explicitly closing the cursor after each use, it appears that the previous cursor is still being reused. How is that possible if I am closing the cursor after I'm done with it and creating a new cursor each time? Can someone please explain what is going on here?

Closer to an answer:

I've added a cursor.connection.commit() to the get_timestamps method and that now gives me accurate data with the timestamps. Can anyone explain why I could need to call cursor.connection.commit() when all I am doing is a select? I am guessing that every time I get a cursor, a transaction begins (or continues with an existing transaction unit it gets committed). Is there a better way to do this or am I stuck committing every time I get a cursor regardless of what I do with that cursor?

+2  A: 

Try calling c.close() as described in the module documentation: http://tools.cherrypy.org/wiki/Databases

def add_timestamp(self):
        c = cherrypy.thread_data.db.cursor()
        now = datetime.datetime.now()
        c.execute("insert into test (given_time) values ('%s')" % now)
        c.connection.commit()
        c.close()
        raise cherrypy.HTTPRedirect('/')

def get_timestamps(self):
        c = cherrypy.thread_data.db.cursor()
        c.execute("select * from test order by given_time desc limit 10")
        records = c.fetchall()
        c.close()
        return records
jcoon
I just made those changes and restarted the cherrypy server and I am still having the same issue.
adam
Update: we moved from postgres 8.0 to 8.3 and this answer now works as well.
adam
A: 

I have added a commit to the method that selects the timestamps and that has solved the problem.

 def get_timestamps(self):
    c = cherrypy.thread_data.db.cursor()
    c.execute("select * from test order by given_time desc limit 10")
    records = c.fetchall()
    c.connection.commit()  # Adding this line fixes the timestamp issue
    c.close()
    return records

Can anyone explain why I would need to call cursor.connection.commit() when all I'm doing is a select?

adam
+1  A: 

To address the question posed by your most recent edits:

In PostgreSQL, NOW() is not the current time, but the time at the start of the current transaction. Psycopg2 is probably starting a transaction implicitly for you, and since the transaction is never closed (by a commit or otherwise), the timestamp gets 'stuck' and becomes stale.

Possible fixes:

  • Commit frequently (silly if you're only doing SELECTs)
  • Set up Psycopg2 to use different behavior for automatically creating transactions (probably tricky to get right, and will affect other parts of your app)
  • Use a different timestamp function, like statement_timestamp() (not SQL-standard-compliant, but otherwise perfect for this scenario)

From the manual, section 9.9.4, emphasis added:

PostgreSQL provides a number of functions that return values related to the current date and time. These SQL-standard functions all return values based on the start time of the current transaction:

  • CURRENT_DATE
  • CURRENT_TIME
  • CURRENT_TIMESTAMP
  • CURRENT_TIME(precision)
  • CURRENT_TIMESTAMP(precision)
  • LOCALTIME LOCALTIMESTAMP
  • LOCALTIME(precision)
  • LOCALTIMESTAMP(precision)

CURRENT_TIME and CURRENT_TIMESTAMP deliver values with time zone; LOCALTIME and LOCALTIMESTAMP deliver values without time zone.

CURRENT_TIME, CURRENT_TIMESTAMP, LOCALTIME, and LOCALTIMESTAMP can optionally be given a precision parameter, which causes the result to be rounded to that many fractional digits in the seconds field. Without a precision parameter, the result is given to the full available precision.

...

Since these functions return the start time of the current transaction, their values do not change during the transaction. This is considered a feature: the intent is to allow a single transaction to have a consistent notion of the "current" time, so that multiple modifications within the same transaction bear the same time stamp.

Note: Other database systems might advance these values more frequently.

PostgreSQL also provides functions that return the start time of the current statement, as well as the actual current time at the instant the function is called. The complete list of non-SQL-standard time functions is:

  • now()
  • transaction_timestamp()
  • statement_timestamp()
  • clock_timestamp()
  • timeofday()

now() is a traditional PostgreSQL equivalent to CURRENT_TIMESTAMP. transaction_timestamp() is likewise equivalent to CURRENT_TIMESTAMP, but is named to clearly reflect what it returns. statement_timestamp() returns the start time of the current statement (more specifically, the time of receipt of the latest command message from the client). statement_timestamp() and transaction_timestamp() return the same value during the first command of a transaction, but might differ during subsequent commands. clock_timestamp() returns the actual current time, and therefore its value changes even within a single SQL command. timeofday() is a historical PostgreSQL function. Like clock_timestamp(), it returns the actual current time, but as a formatted text string rather than a timestamp with time zone value.

kquinn
Thank you for explaining this. I haven't tested your suggestions yet, but I've accepted your answer for doing the best job of explaining why the timestamp would be incorrect. However, now I am wondering if there is a way I can create a cursor without starting a transaction.
adam
You can set Psycopg2 to the transaction isolation level `ISOLATION_LEVEL_AUTOCOMMIT`, which will not start transactions when commands are issued. I don't know how wide-ranging that change would be, though; doing so might break other queries that use transactions.
kquinn