views:

194

answers:

3

Hello, I am working on PostgreSQL and psycopg2. Trying to get feed data which is updated every after 10 mins and keep this feeds contents in PostgreSQL database.My target is to retrieve and print those data from that table. But facing problem as duplicate data is also stored in the database every time I run that script due to insertion operation on table.

To get out off this problem ,I made primary key constraint of column location_title in table Locations-musiq1 where I intend to store my feed data.But facing error.

Here is my code:

import psycopg2
import sys
import feedparser
import codecs
import psycopg2.extensions


# Parsing data from Geofeed location feeds

data = feedparser.parse("some URL")
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)



try:

    conn=psycopg2.connect("dbname='name' user='postgres' host='localhost'     password='abcds'")
    conn.set_client_encoding('UNICODE')


except:
    print "I am unable to connect to the database, exiting."
    sys.exit()
cur=conn.cursor()


for i in range(len(data['entries'])):
    cur.execute("INSERT INTO locations_musiq1(location, location_title) VALUES (%s, %s)",    (data.entries[i].title,data.entries[i].summary))
    conn.commit()
cur.execute("SELECT * FROM locations_musiq1;")
cur.fetchone()
for row in cur:
    print '   '.join(row[1:])


cur.close()
conn.close()

My error after changing "locations_musiq1" tables column "location_title" as primary key is:

    Traceback (most recent call last):
      File "F:\JavaWorkspace\Test\src\postgr_example.py", line 28, in 
        cur.execute("INSERT INTO locations_musiq1(location, location_title) VALUES (%s, %s)",    (data.entries[i].title,data.entries[i].summary))
    psycopg2.IntegrityError: duplicate key value violates unique constraint "locations_musiq1_pkey"

Can anybody have any idea to get out of this problem ?..Thanks in advanced..

+1  A: 

Your code only has INSERT, so what do you think is going to happen when you fetch the same data for a second time?

Your update is failing because you're trying to insert a row which has an identical field value to one that already exists in a column with a unique constraint.

You either need to match entries in from the feed to your table and INSERT, UPDATE, DELETE as appropriate. Where appropriate is defined by the feed data and the reasons for you synchronising. Or you empty your table and populate it from the feed each time.

What are you trying to achieve?

MattH
+1  A: 

You could try something like this:

cur.execute("""
  INSERT INTO locations_musiq1(location, location_title) 
  SELECT %s, %s WHERE NOT EXISTS 
      (SELECT location_title FROM locations_musiq1 WHERE location_title=%s);
  """, (data.entries[i].title, data.entries[i].summary, data.entries[i].summary))
ChristopheD
Thanks for answer!..But still I facing problem like that feed data is not printing serially they supposed to print!..Like they shows data (Part of my script output data): X was in London (at 2010-03-10 14:46:35.0) X was in London(at 2010-03-10 15:30:35.0) X was in London (at 2010-03-10 15:19:35.0) X was in London (at 2010-03-10 15:08:35.0 X was in London (at 2010-03-10 14:57:38.0) X was in London (at 2010-03-10 14:24:35.0) So you see that all those are not appear serially, but they supposed to appear serially!.Do you have idea about this to get all this serially??!?
rahman.bd
Once you inserted them into the database, just use a `select * from locations_musiq1 order by mydatefield asc` (or desc)
ChristopheD
A: 

Rahman. You are asking a second question in your comment that should probably be made into its own question instead.

Anyhow to return the results in a specified order, you need an order by clause. I don't see a timestamp column here but I assume that your feed data is formatted in XML. You could order by some xpath expression. But if you just want them in the order they were inserted, you can sort by the hidden system column xmin, which is the transaction id of the insert operation.

See documentation on system columns.

Scott Bailey