views:

451

answers:

5

Hello everyone,

I have been teaching myself the rudiments of Python and have reached a stage where I would like to tackle a project of increasing complexity in order to "scratch an itch".

I have a large eBook collection (several gigabytes, organized in folders according to topic, mostly .pdf files with the occasional .djvu, .html, and .chm). I have tried a number of eBook manager apps and have found all of them lacking in key areas, or simply unavailable for Linux.

Therefore, I would like to write an eBook manager application in Python that can perform the following tasks:

  • Recursively import all files in a directory and its sub-folders into application's database.
  • Link from entry in database to actual eBook file; open file from application.
  • Rename, relocate, and delete files from application.
  • Select default application to open arbitrary file types.
  • Fetch metadata in bulk from Amazon.com, the Library of Congress, or other repositories.
  • Allow the creation of reading lists.
  • Allow tags, notes, ratings.

Optional, but would be nice:

  • Create BibTex references from arbitrarily selected items.
  • Display cover as image (à la Beagle).
  • Text snippets (also à la Beagle).

So, my questions are:

  1. How would you go about tackling this project?

  2. Which libraries would I have to learn in order to achieve the desired functionality?

  3. What other resources (tutorials, source code, etc.) do you recommend?

Thank you in advance!

A: 

for your 'basic' needs no additional library is needed. the plain standard python library is sufficient. for your optional needs I was looking for answers too

klez
+3  A: 

For the UI, I would suggest wxPython. Tkinter is more used, but I find it horrible. PyGTK and PyQt probably are good options too. Here are two good tutorials:
http://zetcode.com/wxpython/
http://wiki.wxpython.org/AnotherTutorial
There is also a very good book, “wxPython in action”.

It's probably a good idea to store your objects in a relational database. Python ships with the sqlite3 module, which will probably be sufficient and allows to easily experiment on the shell. This SQL tutorial seems good: http://en.wikibooks.org/wiki/Structured_Query_Language.
If you find yourself spending too much time on storing and retrieving the objects, you'll probably want to use a framework like SQLAlchemy which automates the process.

Once you have learned SQL, it's possible that you won't be sure how to map objects to tables (it isn't as easy as it may look). I don't know any resource about that, though.

And don't forget to think about the design before coding. :-) I would also suggest learning how to write unit tests.

Bastien Léonard
+2  A: 

Look at CouchDB instead of traditional relational database models. It's document-centric, so it's a good fit for something like an e-book organizer. There are great python libs for it, but CouchDB speaks HTTP and JSON, so it's pretty trivial to query/connect, etc.

A great solution would be to use the built-in functionality for attachments to actually store the PDFs themselves, and then dynamically assign meta-data as desired. The CouchDB query process uses a two-pass Map/Reduce rather than looping through the data so much faster than traditional relational databases that might have multiple tables and joins between them.

Worth a gander.

FilmJ
Sounds interesting, will definitely check it out. Thank you.
Tiresias
+4  A: 

Firstly, I'm no pro; take everything I say with a pinch of salt.

The biggest decision I can see you're likely to need to take is the nature of the database. I'd be tempted to state that the project is sufficiently ambitious that you might well want to look into something SQL-shaped for the database.

Get to know and love the Global Module Index (1) (for your version of Python). It has probably got 99% of everything you ever need - but at the same time, a speculative search for specific bibliophile modules might be of use.

Also, this is a big project. Have an idea how the whole thing sits together, but focus on smaller, manageable chunks first.

  • Recursively import all files in a directory and its sub-folders into application's database.

You'll want to look at os.walk (2). Use the os module to maintain cross-platform compatibility if this interests you.

  • Link from entry in database to actual eBook file; open file from application.

I would assume that this would be done by referring to the path to the eBook; os.system (3) will allow you to launch the relevant reader.

  • Rename, relocate, and delete files from application.

The os module remains your friend.

  • Select default application to open arbitrary file types. In windows, you can run start filename.pdf to use Windows' inbuilt default applications, or simply have a lookup based on file type. (You probably want to use a dictionary (4) for this.)

  • Fetch metadata in bulk from Amazon.com, the Library of Congress, or other repositories.

Not sure what processing you're wanting: urllib (5) lets you grab HTML in a convienient fashion, but [HTMLParser] may help extract the information you're interested in from the website.

  • Allow the creation of reading lists.

Not sure what you're doing here. I'm guessing this would mean creating a list of eBooks and is therefore a database issue. You might want to consider how you are referring to these books: by file name? ISBN? Your own unique IDs? Amazon URLs?

  • Allow tags, notes, ratings. These are probably elements of your database.

Optional, but would be nice:

  • Create BibTex references from arbitrarily selected items.

If you've got the metadata, should just be a case of concatenating the relevant strings.

  • Display cover as image (à la Beagle).

You might want to build a whole GUI in wxPython (6) or similar. This is a significant task.

  • Text snippets (also à la Beagle).

Might be near impossible for DRM-encumbered forms; you might also need to write different handlers for each format (although some will have modules available, I am sure.)

(1): docs.python.org/modindex.html

(2): docs.python.org/library/os.html#os.walk

(3): docs.python.org/library/os.html#os.system

(4): docs.python.org/tutorial/datastructures.html#dictionaries

(5): docs.python.org/library/urllib.html

(6): www.wxpython.org/

Dragon
+1  A: 

"How would you go about tackling this project?"

Slowly.

First, define your use cases. You have a list of features, but no user interaction even hinted at. Define what interactions you will make with your application and what value the application creates for you. This can be hard.

Then, prioritize those use cases. This is important.

Don't fetishize over the final technology stack. Picking too much technology creates all new problems. Once you've mastered the problem domain, you can add technologies.

Also, don't spend too much time on a GUI. GUI's are hard -- remarkably hard -- and it helps to have everything else understood before starting down the road of GUI development. Learning Python, learning to build GUI's and learning to catalog eBooks is too much.

Pick one use case. Just one. Build a small command-line application that does just this.

Then move to the next use case. You may want to build a separate command-line application, or extend the previous one.

Eventually, you'll have a clunky command-line collection of tools that actually works, doing the various things you need it to do.

Now, you can tackle the GUI.

S.Lott
Very sensible advice, thank you.
Tiresias