ansaurus

Question

Can reading a list from a disk be better than loading a dictionary?

Answer 1

+5 A:

With such a large amount of data, you should be using a database. This would be far better than looking at a list, and would be the most appropriate way of storing your data anyway. If you're using Python, it has SQLite built in I believe.

AlbertoPL 2009-05-21 22:18:34

What is the benefit of using a database?

PyNEwbie 2009-05-21 22:25:27

basically all of the functionality that you are coding already exists by making different SQL calls, so you can create all of the different kinds of lists that you want just based on the tables in the database. A database would also not be storing the data it carries into RAM, which is a huge plus.

AlbertoPL 2009-05-21 22:35:12

Sqlite is a standard module in python, and I would recommend Elixir(SQLAlchemy) and the table/query manager.

monkut 2009-05-22 00:53:49

Answer 2

+1 A:

The dictionary will take more memory because it is effectively a hash.

You don't have to go so far as using a database, since your lookup requirements are so simple. Just use the file system.

Create a directory structure based on the company name (or ticker), with subdirectories for each date. To find whether data exists and load it up, just form the name of the subdirectory where the data would be, and see if it exists.

E.g., IBM news for May 21 would be in C:\db\IBM\20090521\news.txt, if in fact there were news for that day. You just check if the file exists; no searches.

If you want to try and boost speed from there, come up with a scheme to cache a limited amount of results that are likely to be frequently requested (assuming you're operating a server). For that, you'd use a hash.

John Pirie 2009-05-22 02:15:54

Clever +1, I don't want to add a complex directory structure though, 300K identifiers would make it very hard for them to walk their directory structure.

PyNEwbie 2009-05-22 02:32:21

Certainly don't want thousands in a single directory. So you subdivide, and create C:\db\I\B\M\2009\05\21\news.txt.

John Pirie 2009-05-22 02:34:19

And that's easier than using sqlite really?

Seun Osewa 2009-05-23 02:19:09

What's hard?symbol, datestr = "IBM", "20090521"newsname = "C:/db/%s/%s/news.txt" % ( "/".join( symbol ), datestr )if( os.path.isfile( newsname ) ): ...

John Pirie 2009-05-23 11:59:18

ansaurus

tags:

views:

answers:

Can reading a list from a disk be better than loading a dictionary?

related questions