views:

244

answers:

3

I am learning Python (I have a C/C++ background).

I need to write something practical in Python though, whilst learning. I have the following pseudocode (my first attempt at writing a Python script, since reading about Python yesterday). Hopefully, the snippet details the logic of what I want to do. BTW I am using python 2.6 on Ubuntu Karmic.

Assume the script is invoked as: script_name.py directory_path

import csv, sys, os, glob

# Can I declare that the function accepts a dictionary as first arg?
def getItemValue(item, key, defval)
  return !item.haskey(key) ? defval : item[key]


dirname = sys.argv[1]

# declare some default values here
weight, is_male, default_city_id = 100, true, 1 

# fetch some data from a database table into a nested dictionary, indexed by a string
curr_dict = load_dict_from_db('foo')

#iterate through all the files matching *.csv in the specified folder
for infile in glob.glob( os.path.join(dirname, '*.csv') ):
  #get the file name (without the '.csv' extension)
  code = infile[0:-4]
  # open file, and iterate through the rows of the current file (a CSV file)
  f = open(infile, 'rt')
  try:
    reader = csv.reader(f)
    for row in reader:
      #lookup the id for the code in the dictionary
      id = curr_dict[code]['id']
      name = row['name']
      address1 = row['address1']
      address2 = row['address2']
      city_id = getItemValue(row, 'city_id', default_city_id)

      # insert row to database table

  finally:
    f.close()

I have the following questions:

  1. Is the code written in a Pythonic enough way (is there a better way of implementing it)?

  2. Given a table with a schema like shown below, how may I write a Python function that fetches data from the table and returns is in a dictionary indexed by string (name).

  3. How can I insert the row data into the table (actually I would like to use a transaction if possible, and commit just before the file is closed)

Table schema:

create table demo (id int, name varchar(32), weight float, city_id int);

BTW, my backend database is postgreSQL

[Edit]

Wayne et al:

To clarify, what I want is a set of rows. Each row can be indexed by a key (so that means the rows container is a dictionary (right)?. Ok, now once we have retrieved a row by using the key, I also want to be able to access the 'columns' in the row - meaning that the row data itself is a dictionary. I dont know if Python supports multidimensional array syntax when dealing with dictionaries - but the following statement will help explain how I intend to conceptually use the data returned from the db. A statement like dataset['joe']['weight'] will first fetch the row data indexed by the key 'joe' (which is a dictionary) and then index that dictionary for the key 'weight'. I want to know how to build such a dictionary of dictionaries from the retrieved data in a Pythonic way like you did before.

A simplistic way would be to write something like:

import pyodbc

mydict = {}
cnxn = pyodbc.connect(params)
cursor = cnxn.cursor()
cursor.execute("select user_id, user_name from users"):

for row in cursor:
   mydict[row.id] = row

Is this correct/can it be written in a more pythonic way?

+4  A: 

to get the value from the dictionary you need to use .get method of the dict:

>>> d = {1: 2}
>>> d.get(1, 3)
2
>>> d.get(5, 3)
3

This will remove the need for getItemValue function. I wont' comment on the existing syntax since it's clearly alien to Python. Correct syntax for the ternary in Python is:

true_val if true_false_check else false_val
>>> 'a' if False else 'b'
'b'

But as I'm saying below, you don't need it at all.

If you're using Python > 2.6, you should use with statement over the try-finally:

with open(infile) as f:
    reader = csv.reader(f)
    ... etc

Seeing that you want to have row as dictionary, you should be using csv.DictReader and not a simple csv. reader. However, it is unnecessary in your case. Your sql query could just be constructed to access the fields of the row dict. In this case you wouldn't need to create separate items city_id, name, etc. To add default city_id to row if it doesn't exist, you could use .setdefault method:

>>> d
{1: 2}
>>> d.setdefault(1, 3)
2
>>> d
{1: 2}
>>> d.setdefault(3, 3)
3
>>> d
{1: 2, 3: 3}

and for id, simply row[id] = curr_dict[code]['id']

When slicing, you could skip 0:

>>> 'abc.txt'[:-4]
'abc'

Generally, Python's library provide a fetchone, fetchmany, fetchall methods on cursor, which return Row object, that might support dict-like access or return a simple tuple. It will depend on the particular module you're using.

SilentGhost
@silent: feel free to comment on the syntax. This was my b=very first Python script (and I only started reading about Python yesterday), so feel free to correct whatever mistakes you can spot. The whole idea of the post (apart from getting up and running ASAP) is to learn from 'Pythonistas' ;)
morpheous
@silent: +1 for the helpful tips. Much appreciated
morpheous
+2  A: 

It looks mostly Pythonic enough for me.

The ternary operation should look like this though (I think this will return the result you expect):

return defval if not key in item else item[key]

Yeah, you can pass a dictionary (or any other value) in basically any order. The only difference is if you use the *args, **kwargs (named by convention. Technically you can use any name you want) which expect to be in that order and the last one or two arguments.

For inserting into a DB you can use the odbc module:

import odbc
conn = odbc.odbc('servernamehere')
cursor = conn.cursor()
cursor.execute("INSERT INTO mytable VALUES (42, 'Spam on Eggs', 'Spam on Wheat')")
conn.commit()

You can read up or find plenty of examples on the odbc module - I'm sure there are other modules as well, but that one should work fine for you.

For retrieval you would use

cursor.execute("SELECT * FROM demo")
#Reads one record - returns a tuple
print cursor.fetchone()
#Reads the rest of the records - a list of tuples
print cursor.fetchall()

to make one of those records into a dictionary:

record = cursor.fetchone()
# Removes the 2nd element (at index 1) from the record
mydict[record[1]] = record[:1] + record[2:]

Though that practically screams for a generator expression if you want the whole shebang at once

mydict = dict((record[1], record[:1] + record[2:] for record in cursor.fetchall())

which should give you all of the records packed up neatly in a dictionary, using the name as a key.

HTH

Wayne Werner
1. it's `has_key` 2. it's deprecated, use `key in d`
SilentGhost
@wayne: +1 for the db related code. Thanks
morpheous
key in d is new - hadn't heard, thanks for the heads up (and I fixed that part).@morpheous: n/p - as I'm currently working with databases in an unfamiliar language, I know how important code examples are!
Wayne Werner
A great post, but can I suggest `cursor.execute("INSERT INTO mytable VALUES (?, ?, ?)", [42, 'Spam on Eggs', 'Spam on Wheat'])` or something similar so that SQL injection attacks are avoided?
David Morrissey
+2  A: 

a colon required after defs:

def getItemValue(item, key, defval):
    ...

boolean operators: In python !->not; &&->and and ||->or (see http://docs.python.org/release/2.5.2/lib/boolean.html for boolean operators). There's no ? : operator in python, there is a return (x) if (x) else (x) expression although I personally rarely use it in favour of plain if's.

booleans/None: True, False and None have capitals before them.

checking types of arguments: In python, you generally don't declare types of function parameters. You could go e.g. assert isinstance(item, dict), "dicts must be passed as the first parameter!" in the function although this kind of "strict checking" is often discouraged as it's not always necessary in python.

python keywords: default isn't a reserved python keyword and is acceptable as arguments and variables (just for the reference.)

style guidelines: PEP 8 (the python style guideline) states that module imports should generally only be one per line, though there are some exceptions (I have to admit I often don't follow the import sys and os on separate lines, though I usually follow it otherwise.)

file open modes: rt isn't valid in python 2.x - it will work, though the t will be ignored. See also http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files. It is valid in python 3 though, so I don't think it it'd hurt if you want to force text mode, raising exceptions on binary characters (use rb if you want to read non-ASCII characters.)

working with dictionaries: Python used to use dict.has_key(key) but you should use key in dict now (which has largely replaced it, see http://docs.python.org/library/stdtypes.html#mapping-types-dict.)

split file extensions: code = infile[0:-4] could be replaced with code = os.path.splitext(infile)[0] (which returns e.g. ('root', '.ext') with the dot in the extension (see http://docs.python.org/library/os.path.html#os.path.splitext).

EDIT: removed multiple variable declarations on a single line stuff and added some formatting. Also corrected the rt isn't a valid mode in python when in python 3 it is.

David Morrissey
he's passing directory name, it has nothing to do with `sys.argv[0]`
SilentGhost
It's perfectly fine to declare variables in a single line
SilentGhost
@SilentGhost: Changed, thanks for the suggestions!
David Morrissey