views:

1464

answers:

3

I'm doing a database for storing my eBook collection.
Most of them have the ISBN within the text of the book itself.
How can I access this contents?
Is there any sourcecode or DLLs for doing that?

+3  A: 

I did it for eBook library app. First of all you need to extract text from chm or pdf file. There are a lot of utilities\libraries to do it. Here is an article on CodeProject on how to extract content from CHM files. For PDF files I used pdftotext utility. When you get plain text from eBook parse it using regular expression to find ISBN10/13 code.

aku
+1  A: 

Extracting the text from CHM and PDF files is the first step. Next you can find the ISBN number with a regular expression.

Darin Dimitrov
A: 

Did you ever make much progress with this? I'm looking to do something similar, though with PowerShell just to organize and rename some books based on subject (folder) and title, year (filename).

Michael Houston Moore