views:

832

answers:

4

Working on a script to collect users browser history with time stamps ( educational setting). Firefox 3 history is kept in a sqlite file, and stamps are in UNIX epoch time... getting them and converting to readable format via a SQL command in python is pretty straightforward:

sql_select = """ SELECT datetime(moz_historyvisits.visit_date/1000000,'unixepoch','localtime'), 
                        moz_places.url 
                 FROM moz_places, moz_historyvisits 
                 WHERE moz_places.id = moz_historyvisits.place_id
             """
get_hist = list(cursor.execute (sql_select))

Chrome also stores history in a sqlite file.. but it's history time stamp is apparently formatted as the number of microseconds since midnight UTC of 1 January 1601....

How can this timestamp be converted to a readable format as in the Firefox example (like 2010-01-23 11:22:09)? I am writing the script with python 2.5.x ( the version on OS X 10.5 ), and importing sqlite3 module....

+1  A: 

The sqlite module returns datetime objects for datetime fields, which have a format method for printing readable strings called strftime.

You can do something like this once you have the recordset:

for record in get_hist:
  date_string = record[0].strftime("%Y-%m-%d %H:%M:%S")
  url = record[1]
jcoon
I guess I am looking for the lazy man's way out -- hoping to just replace the 'unixepoch' bit in the code for Firefox so it'll work with Chrome... If Chrome used unixepoch I'd be all set and I could reuse the line ( first changing the moz stuff to Chrome's tables)..I'm just stuck on how to convert that time format ( I have also seen it referred to as 'WebKit' time format.)...
CaseyIT
So write a function to convert the date. Then you can do `date = get_webkit_date(record[0])`. That *is* the lazy man's way out.
jcdyer
There's also a strptime if you want to go the other way (string to datetime object). See the [python datetime docs](http://docs.python.org/library/datetime.html) for more details on how to do that.
jcdyer
Correction: You probably want the .fromtimestamp() method
jcdyer
I guess I have no idea what the function would look like to convert to WebKit time ... Is there not something built into sqlite that can be called like the unixepoch modifier to do this? I am not very familiar with sqlite3 -- It looks like .fromtimestamp() wouldn't work since it would only handle dates back to 1970?
CaseyIT
+1  A: 

This may not be the most Pythonic code in the world, but here's a solution: Cheated by adjusting for time zone (EST here) by doing this:

utctime = datetime.datetime(1601,1,1) + datetime.timedelta(microseconds = ms, hours =-5)

Here's the function : It assumes that the Chrome history file has been copied from another account into /Users/someuser/Documents/tmp/Chrome/History

def getcr():
    connection = sqlite3.connect('/Users/someuser/Documents/tmp/Chrome/History')
    cursor = connection.cursor()
    get_time = list(cursor.execute("""SELECT last_visit_time FROM urls"""))
    get_url = list(cursor.execute("""SELECT url from urls"""))
    stripped_time = []
    crf = open ('/Users/someuser/Documents/tmp/cr/cr_hist.txt','w' )
    itr = iter(get_time)
    itr2 = iter(get_url)

    while True:
        try:
            newdate = str(itr.next())
            stripped1 = newdate.strip(' (),L')
            ms = int(stripped1)
            utctime = datetime.datetime(1601,1,1) + datetime.timedelta(microseconds = ms, hours =-5)
            stripped_time.append(str(utctime))
            newurl = str(itr2.next())
            stripped_url = newurl.strip(' ()')
            stripped_time.append(str(stripped_url))
            crf.write('\n')
            crf.write(str(utctime))
            crf.write('\n')
            crf.write(str(newurl))
            crf.write('\n')
            crf.write('\n')
            crf.write('********* Next Entry *********') 
            crf.write('\n')
        except StopIteration:
            break

    crf.close()            

    shutil.copy('/Users/someuser/Documents/tmp/cr/cr_hist.txt' , '/Users/parent/Documents/Chrome_History_Logs')
    os.rename('/Users/someuser/Documents/Chrome_History_Logs/cr_hist.txt','/Users/someuser/Documents/Chrome_History_Logs/%s.txt' % formatdate)
CaseyIT
+1  A: 

This is a more pythonic and memory-friendly way to do what you described (by the way, thanks for the initial code!):

#!/usr/bin/env python

import os
import datetime
import sqlite3
import opster
from itertools import izip

SQL_TIME = 'SELECT time FROM info'
SQL_URL  = 'SELECT c0url FROM pages_content'

def date_from_webkit(webkit_timestamp):
    epoch_start = datetime.datetime(1601,1,1)
    delta = datetime.timedelta(microseconds=int(webkit_timestamp))
    return epoch_start + delta

@opster.command()
def import_history(*paths):
    for path in paths:
        assert os.path.exists(path)
        c = sqlite3.connect(path)
        times = (row[0] for row in c.execute(SQL_TIME))
        urls  = (row[0] for row in c.execute(SQL_URL))
        for timestamp, url in izip(times, urls):
            date_time = date_from_webkit(timestamp)
            print date_time, url
        c.close()

if __name__=='__main__':
    opster.dispatch()

The script can be used this way:

$ ./chrome-tools.py import-history ~/.config/chromium/Default/History* > history.txt

Of course Opster can be thrown out but seems handy to me :-)

Andy Mikhaylenko