views:

42

answers:

1

I have a large set of values V, some of which are likely to exist in a table T. I would like to insert into the table those which are not yet inserted. So far I have the code:

for value in values:
        s = self.conn.execute(mytable.__table__.select(mytable.value == value)).first()      
        if not s:
            to_insert.append(value)

I feel like this is running slower than it should. I have a few related questions:

  1. Is there a way to construct a select statement such that you provide a list (in this case, 'values') to which sqlalchemy responds with records which match that list?
  2. Is this code overly expensive in constructing select objects? Is there a way to construct a single select statement, then parameterize at execution time?
A: 

For the first question, something like this if I understand your question correctly

mytable.__table__.select(mytable.value.in_(values)

For the second question, querying this by 1 row at a time is overly expensive indeed, although you might not have a choice in the matter. As far as I know there is no tuple select support in SQLAlchemy so if there are multiple variables (think polymorhpic keys) than SQLAlchemy can't help you.

Either way, if you select all matching rows and insert the difference you should be done :) Something like this should work:

results = self.conn.execute(mytable.__table__.select(mytable.value.in_(values))
available_values = set(row.value for row in results)
to_insert = set(values) - available_values
WoLpH
Hmm, I like this idea; one problem is that I think values is too large and maxes out the statement. Do I need to make an intermediate table or something?
muckabout
If the data comes from the database you could do it completely in the database. But if not, you can easily batch this per 1000 items by slicing the list of values :) Creating a temporary table from something you already have available locally shouldn't be needed.
WoLpH
good point, that works!
muckabout