ansaurus

Question

Populating a SQLite3 database from a .txt file with Python

Answer 1

+1 A:

You could read the data using the csv module. Then you can create an insert sql statement and use the method executemany:

  cursor.executemany(sql, rows)

or use add_all if you use sqlalchemy.

DiggyF 2010-07-17 09:42:05

Answer 2

A:

Something like the following should work: (not tested)

# Open database (will be created if not exists)
conn = sqlite3.connect('/path/to/your_file.db')

c = conn.cursor()

# Create table
c.execute('''create table representatives
(parliament text, name text, country text, party_group text, national_party text, position text)''')

f = open("thefile.txt")
for i in f.readlines():
    # Insert a row of data
    c.execute("""insert into representatives
                 values (?,?,?,?,?,?)""", *i.split(", ")) # *i.split(", ") does unpack the list as arguments

# Save (commit) the changes
conn.commit()

# We can also close the cursor if we are done with it
c.close()

Joschua 2010-07-17 10:26:49

Hi Joschua, thanks for the reply! However I keep getting this error when using your example above:>Traceback (most recent call last): File "/Users/thomasjensen/Documents/sql_test.py", line 13, in <module> c.execute("""insert into MEP (Parliament, Name, Country, Party_Group, Home_Party, Position) values (?, ?, ?, ?, ?, ?)""", *i.split(","))TypeError: function takes at most 2 arguments (162 given)

Thomas Jensen 2010-07-17 11:14:13

Simply remove the leading asterisk from `*i.split(", ")`. However, the fact that the error message says "162 [arguments] given" suggests that there will be more issues to follow. I would also strongly recommend using any of the other answers currently on this page, which all use the "csv" library, rather than relying on `split(", ")`, which will cause you a lot of grief.

Aram Dulyan 2010-07-17 12:15:43

Ok, thanks for the advice.

Thomas Jensen 2010-07-17 17:13:52

Answer 3

+2 A:

So assuming your models.py looks something like this:

class Representative(models.Model):
    parliament = models.CharField(max_length=128)
    name = models.CharField(max_length=128)
    country = models.CharField(max_length=128)
    party_group = models.CharField(max_length=128)
    national_party = models.CharField(max_length=128)
    position = models.CharField(max_length=128)

You can then run python manage.py shell and execute the following:

import csv
from your_app.models import Representative
# If you're using different field names, change this list accordingly.
# The order must also match the column order in the CSV file.
fields = ['parliament', 'name', 'country', 'party_group', 'national_party', 'position']
for row in csv.reader(open('your_file.csv')):
    Representative.objects.create(**dict(zip(fields, row)))

And you're done.

Addendum (edit)

Per Thomas's request, here's an explanation of what **dict(zip(fields,row)) does:

So initially, fields contains a list of field names that we defined, and row contains a list of values that represents the current row in the CSV file.

fields = ['parliament', 'name', 'country', ...]
row = ['7', 'Marta Andreasen', 'United Kingdom', ...]

What zip() does is it combines two lists into one list of pairs of items from both lists (like a zipper); i.e. zip(['a','b,'c'], ['A','B','C']) will return [('a','A'), ('b','B'), ('c','C')]. So in our case:

>>> zip(fields, row)
[('parliament', '7'), ('name', 'Marta Andreasen'), ('country', 'United Kingdom'), ...]

The dict() function simply converts the list of pairs into a dictionary.

>>> dict(zip(fields, row))
{'parliament': '7', 'name': 'Marta Andreasen', 'country': 'United Kingdom', ...}

The ** is a way of converting a dictionary into a keyword argument list for a function. So function(**{'key': 'value'}) is the equivalent of function(key='value'). So in out example, calling create(**dict(zip(field, row))) is the equivalent of:

create(parliament='7', name='Marta Andreasen', country='United Kingdom', ...)

Hope this clears things up.

Aram Dulyan 2010-07-17 10:39:26

If you're loading this into Django, definitely look into Aram's solution here for use. It's taking advantage of the django ORM bits for you, so you don't need to mess with the schema directly. Not a big deal if you're comfortable with SQL schema, but I found it made things much easier.

heckj 2010-07-17 23:16:00

+1; this is much simpler!

Adam Bernier 2010-07-17 23:27:10

Thanks for the answer Aram. I am still learning python, so could you you explain what the (**dict(zip(...)) part of the last line does?

Thomas Jensen 2010-07-18 09:22:22

I've added an explanation to my answer.

Aram Dulyan 2010-07-19 02:19:40

Thanks a lot Aram, this really helps! I am continually suprised at how frindly the stack overflow community is the newcomers :)P.S: the code worked perfectly :)

Thomas Jensen 2010-07-19 11:24:22

Answer 4

+4 A:

As SiggyF says and only slightly differently than Joschua:

Create a text file with your schema, e.g.:

CREATE TABLE politicians (
    Parliament text, 
    Name text, 
    Country text, 
    Party_Group text, 
    National_Party text, 
    Position text
);

Create table:

>>> import csv, sqlite3
>>> conn = sqlite3.connect('my.db')
>>> c = conn.cursor()
>>> with open('myschema.sql') as f:            # read in schema file 
...   schema = f.read()
... 
>>> c.execute(schema)                          # create table per schema 
<sqlite3.Cursor object at 0x1392f50>
>>> conn.commit()                              # commit table creation

Use csv module to read file with data to be inserted:

>>> csv_reader = csv.reader(open('myfile.txt'), skipinitialspace=True)
>>> csv_reader.next()                          # skip the first line in the file
['Parliament', 'Name', 'Country', ...

# put all data in a tuple
# edit: decoding from utf-8 file to unicode
>>> to_db = tuple([i.decode('utf-8') for i in line] for line in csv_reader)
>>> to_db                                      # this will be inserted into table
[(u'7', u'Marta Andreasen', u'United Kingdom', ...

Insert data:

>>> c.executemany("INSERT INTO politicians VALUES (?,?,?,?,?,?);", to_db)
<sqlite3.Cursor object at 0x1392f50>
>>> conn.commit()

Verify that all went as expected:

>>> c.execute('SELECT * FROM politicians').fetchall()
[(u'7', u'Marta Andreasen', u'United Kingdom', ...

Edit:
And since you've decoded (to unicode) on input, you need to be sure to encode on output.
For example:

with open('encoded_output.txt', 'w') as f:
  for row in c.execute('SELECT * FROM politicians').fetchall():
    for col in row:
      f.write(col.encode('utf-8'))
      f.write('\n')

Adam Bernier 2010-07-17 10:43:37

Hi Adam, thanks for the elaborate answer! Every step works, except when i try:c.executemany("INSERT INTO politicians VALUES (?,?,?,?,?,?);", to_db)Then i get the following error:ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.I have saved the text file with a utf8 encoding, so I have no idea what is happening here...

Thomas Jensen 2010-07-17 17:08:01

@Thomas: You're welcome. I updated the example to handle decoding from utf-8, and also showed how to encode back to utf-8 on output. Best of luck to you.

Adam Bernier 2010-07-17 22:49:57

Thanks Adam, for a beginner this whole encoding business can be quite confusing.

Thomas Jensen 2010-07-19 12:12:41

Answer 5

+1 A:

You asked what the create(**dict(zip(fields, row))) line did.

I don't know how to reply directly to your comment, so I'll try to answer it here.

zip takes multiple lists as args and returns a list of their correspond elements as tuples.

zip(list1, list2) => [(list1[0], list2[0]), (list1[1], list2[1]), .... ]

dict takes a list of 2-element tuples and returns a dictionary mapping each tuple's first element (key) to its second element (value).

create is a function that takes keyword arguments. You can use **some_dictionary to pass that dictionary into a function as keyword arguments.

create(**{'name':'john', 'age':5}) => create(name='john', age=5)

Steve 2010-07-18 11:19:07

Thanks for the explanation Steve.

Thomas Jensen 2010-07-19 11:25:02

ansaurus

tags:

views:

answers:

Populating a SQLite3 database from a .txt file with Python

related questions