views:

140

answers:

5

Dear Python Experts,

I am trying to setup a website in django which allows the user to send queries to a database containing information about their representatives in the European Parliament. I have the data in a comma seperated .txt file with the following format:

Parliament, Name, Country, Party_Group, National_Party, Position

7, Marta Andreasen, United Kingdom, Europe of freedom and democracy Group, United Kingdom Independence Party, Member

etc....

I want to populate a SQLite3 database with this data, but so far all the tutorials I have found only show how to do this by hand. Since I have 736 observations in the file I dont really want to do this.

I suspect this is a simple matter, but I would be very grateful if someone could show me how to do this.

Thomas

+1  A: 

You could read the data using the csv module. Then you can create an insert sql statement and use the method executemany:

  cursor.executemany(sql, rows)

or use add_all if you use sqlalchemy.

DiggyF
A: 

Something like the following should work: (not tested)

# Open database (will be created if not exists)
conn = sqlite3.connect('/path/to/your_file.db')

c = conn.cursor()

# Create table
c.execute('''create table representatives
(parliament text, name text, country text, party_group text, national_party text, position text)''')

f = open("thefile.txt")
for i in f.readlines():
    # Insert a row of data
    c.execute("""insert into representatives
                 values (?,?,?,?,?,?)""", *i.split(", ")) # *i.split(", ") does unpack the list as arguments

# Save (commit) the changes
conn.commit()

# We can also close the cursor if we are done with it
c.close()
Joschua
Hi Joschua, thanks for the reply! However I keep getting this error when using your example above:>Traceback (most recent call last): File "/Users/thomasjensen/Documents/sql_test.py", line 13, in <module> c.execute("""insert into MEP (Parliament, Name, Country, Party_Group, Home_Party, Position) values (?, ?, ?, ?, ?, ?)""", *i.split(","))TypeError: function takes at most 2 arguments (162 given)
Thomas Jensen
Simply remove the leading asterisk from `*i.split(", ")`. However, the fact that the error message says "162 [arguments] given" suggests that there will be more issues to follow. I would also strongly recommend using any of the other answers currently on this page, which all use the "csv" library, rather than relying on `split(", ")`, which will cause you a lot of grief.
Aram Dulyan
Ok, thanks for the advice.
Thomas Jensen
+2  A: 

So assuming your models.py looks something like this:

class Representative(models.Model):
    parliament = models.CharField(max_length=128)
    name = models.CharField(max_length=128)
    country = models.CharField(max_length=128)
    party_group = models.CharField(max_length=128)
    national_party = models.CharField(max_length=128)
    position = models.CharField(max_length=128)

You can then run python manage.py shell and execute the following:

import csv
from your_app.models import Representative
# If you're using different field names, change this list accordingly.
# The order must also match the column order in the CSV file.
fields = ['parliament', 'name', 'country', 'party_group', 'national_party', 'position']
for row in csv.reader(open('your_file.csv')):
    Representative.objects.create(**dict(zip(fields, row)))

And you're done.

Addendum (edit)

Per Thomas's request, here's an explanation of what **dict(zip(fields,row)) does:

So initially, fields contains a list of field names that we defined, and row contains a list of values that represents the current row in the CSV file.

fields = ['parliament', 'name', 'country', ...]
row = ['7', 'Marta Andreasen', 'United Kingdom', ...]

What zip() does is it combines two lists into one list of pairs of items from both lists (like a zipper); i.e. zip(['a','b,'c'], ['A','B','C']) will return [('a','A'), ('b','B'), ('c','C')]. So in our case:

>>> zip(fields, row)
[('parliament', '7'), ('name', 'Marta Andreasen'), ('country', 'United Kingdom'), ...]

The dict() function simply converts the list of pairs into a dictionary.

>>> dict(zip(fields, row))
{'parliament': '7', 'name': 'Marta Andreasen', 'country': 'United Kingdom', ...}

The ** is a way of converting a dictionary into a keyword argument list for a function. So function(**{'key': 'value'}) is the equivalent of function(key='value'). So in out example, calling create(**dict(zip(field, row))) is the equivalent of:

create(parliament='7', name='Marta Andreasen', country='United Kingdom', ...)

Hope this clears things up.

Aram Dulyan
If you're loading this into Django, definitely look into Aram's solution here for use. It's taking advantage of the django ORM bits for you, so you don't need to mess with the schema directly. Not a big deal if you're comfortable with SQL schema, but I found it made things much easier.
heckj
+1; this is much simpler!
Adam Bernier
Thanks for the answer Aram. I am still learning python, so could you you explain what the (**dict(zip(...)) part of the last line does?
Thomas Jensen
I've added an explanation to my answer.
Aram Dulyan
Thanks a lot Aram, this really helps! I am continually suprised at how frindly the stack overflow community is the newcomers :)P.S: the code worked perfectly :)
Thomas Jensen
+4  A: 

As SiggyF says and only slightly differently than Joschua:

Create a text file with your schema, e.g.:

CREATE TABLE politicians (
    Parliament text, 
    Name text, 
    Country text, 
    Party_Group text, 
    National_Party text, 
    Position text
);

Create table:

>>> import csv, sqlite3
>>> conn = sqlite3.connect('my.db')
>>> c = conn.cursor()
>>> with open('myschema.sql') as f:            # read in schema file 
...   schema = f.read()
... 
>>> c.execute(schema)                          # create table per schema 
<sqlite3.Cursor object at 0x1392f50>
>>> conn.commit()                              # commit table creation

Use csv module to read file with data to be inserted:

>>> csv_reader = csv.reader(open('myfile.txt'), skipinitialspace=True)
>>> csv_reader.next()                          # skip the first line in the file
['Parliament', 'Name', 'Country', ...

# put all data in a tuple
# edit: decoding from utf-8 file to unicode
>>> to_db = tuple([i.decode('utf-8') for i in line] for line in csv_reader)
>>> to_db                                      # this will be inserted into table
[(u'7', u'Marta Andreasen', u'United Kingdom', ...

Insert data:

>>> c.executemany("INSERT INTO politicians VALUES (?,?,?,?,?,?);", to_db)
<sqlite3.Cursor object at 0x1392f50>
>>> conn.commit()

Verify that all went as expected:

>>> c.execute('SELECT * FROM politicians').fetchall()
[(u'7', u'Marta Andreasen', u'United Kingdom', ...

Edit:
And since you've decoded (to unicode) on input, you need to be sure to encode on output.
For example:

with open('encoded_output.txt', 'w') as f:
  for row in c.execute('SELECT * FROM politicians').fetchall():
    for col in row:
      f.write(col.encode('utf-8'))
      f.write('\n')
Adam Bernier
Hi Adam, thanks for the elaborate answer! Every step works, except when i try:c.executemany("INSERT INTO politicians VALUES (?,?,?,?,?,?);", to_db)Then i get the following error:ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.I have saved the text file with a utf8 encoding, so I have no idea what is happening here...
Thomas Jensen
@Thomas: You're welcome. I updated the example to handle decoding from utf-8, and also showed how to encode back to utf-8 on output. Best of luck to you.
Adam Bernier
Thanks Adam, for a beginner this whole encoding business can be quite confusing.
Thomas Jensen
+1  A: 

You asked what the create(**dict(zip(fields, row))) line did.

I don't know how to reply directly to your comment, so I'll try to answer it here.

zip takes multiple lists as args and returns a list of their correspond elements as tuples.

zip(list1, list2) => [(list1[0], list2[0]), (list1[1], list2[1]), .... ]

dict takes a list of 2-element tuples and returns a dictionary mapping each tuple's first element (key) to its second element (value).

create is a function that takes keyword arguments. You can use **some_dictionary to pass that dictionary into a function as keyword arguments.

create(**{'name':'john', 'age':5}) => create(name='john', age=5)

Steve
Thanks for the explanation Steve.
Thomas Jensen