views:

380

answers:

4

I have a CSV file with several entries, and each entry has 2 unix timestamp formatted dates.

I have a method called convert(), which takes in the timestamp and converts it to YYYYMMDD.

Now, since I have 2 timestamps in each line, how would I replace each one with the new value?

EDIT: Just to clarify, I would like to convert each occurrence of the timestamp into the YYYYMMDD format. This is what is bugging me, as re.findall() returns a list.

+3  A: 

If you know the replacement:

p = re.compile( r',\d{8},')
p.sub( ','+someval+',', csvstring )

if it's a format change:

p = re.compile( r',(\d{4})(\d\d)(\d\d),')
p.sub( r',\3-\2-\1,', csvstring )

EDIT: sorry, just realised you said python, modified above

Luke Schafer
my python is a bit sketchy, hope I got it right.
Luke Schafer
He said a Unix timestamp, which should be something like 1243326265 (current time). He wants the YYYYMMDD format as output.
ΤΖΩΤΖΙΟΥ
+1  A: 

I assume that by "unix timestamp formatted date" you mean a number of seconds since the epoch. This assumes that every number in the file is a UNIX timestamp. If that isn't the case you'll need to adjust the regex:

import re, sys

# your convert function goes here

regex = re.compile(r'(\d+)')
for line in sys.stdin:
  sys.stdout.write(regex.sub(lambda m:
  convert(int(m.group(1))), line))

This reads from stdin and calls convert on each number found.

The "trick" here is that re.sub can take a function that transforms from a match object into a string. I'm assuming your convert function expects an int and returns a string, so I've used a lambda as an adapter function to grab the first group of the match, convert it to an int, and then pass that resulting int to convert.

Laurence Gonsalves
thanks! i'm still beginning python, and this helps a lot.
aaront
I'm getting a "no such group" error.
aaront
Hmmm... What does the input line where you're hitting that error look like? (You might want to add a `sys.stdout.flush()` call right after the `sys.stdout.write` line while debugging.)
Laurence Gonsalves
all good now, thanks
aaront
A: 

I'd use something along these lines. A lot like Laurence's response but with the timestamp conversion that you requested and takes the filename as a param. This code assumes you are working with recent dates (after 9/9/2001). If you need earlier dates, lower 10 to 9 or less.

import re, sys, time

regex = re.compile(r'(\d{10,})')

def convert(unixtime):
  return time.strftime("%Y%m%d", time.gmtime(unixtime))

for line in open(sys.argv[1]):
  sys.stdout.write(regex.sub(lambda m: convert(int(m.group(0))), line))

EDIT: Cleaned up the code.

Sample Input

foo,1234567890,bar,1243310263
cat,1243310263,pants,1234567890
baz,987654321,raz,1

Output

foo,20090213,bar,20090526
cat,20090526,pants,20090213
baz,987654321,raz,1 # not converted (too short to be a recent)
fearphage
+1  A: 

Not able to comment your question, but did you take a look at the CSV module of python? http://docs.python.org/library/csv.html#module-csv

buster