views:

74

answers:

2

Hi everyone,

My application needs to retrieve information about any published book based on a provided ISBN, title, or author. This is hardly a unique requirement---sites like Amazon.com, Chegg.com, and even software like Book Collector seem to be able to do this easily. But I have not been able to replicate it.

To clarify, I do not need to search the entire database of books---only a limited subset which have been inputted, as in a book collection. The database would simply allow me to tag the inputted books with the necessary metadata to enable search on that subset of books. So scale is not the issue here---getting the metadata is.

The options I have tried are:

  1. Scrape Amazon. Scraping the regular Amazon pages was not very robust to things like missing authors, and while scraping the smaller mobile pages was faster, they shared the same issues with robustness of extraction. Plus, building this into an application is a clear violation of Amazon's Terms of Service.
  2. Scrape the Library of Congress. While this seems to have fewer legal ramifications, ease and robustness were again issues.
  3. ISBNdb.com API. While the service is free up to a point, and does a good job of returning the necessary metadata, I need to do this for over 500 books on a daily basis, at which point this service costs money proportional to use. I'd prefer a free or one-time payment solution that allows me to do the same.
  4. Google Book Data API. While this seems to provide the information I need, I cannot display the book preview as their terms of service requires.
  5. Buy a license to a database of books. For example, companies like Ingram or Baker & Taylor provide these catalogs to retailers and libraries. This solution is obviously expensive, so I'm hoping that there's a more elegant solution I've missed. But if not, and someone on SO has had a good experience with a particular database, I'm willing to go with that.

I've tried to describe my approach in detail so others with fewer books can take advantage of the above solutions. But given my requirements, I'm at my wits' end for retrieving book metadata, so any pointers are greatly appreciated.

A: 

Since it is unlikely that you have to retrieve the same 500 books every day: store the data retrieved from isbndb.com in a database and fill it up book by book.

akira
I'd like to do this, but the limit of 500 books per day is a significant constraint whenever I load large (~30,000) inventories into the database.It would be ideal to either hack together an API for or purchase access to an existing database which I could then use without limits on the number of lookups.
Saketh
with that high number of items it seems that you are going the professional route. i doubt that any service will let you basically clone their databases without paying them (serious) money.
akira
The issue is that the inputting is staggered (e.g. 10,000 books at once, then none for some time), but the inputting must be done at once.
Saketh
A: 

As it seems, a lot of libraries and other organisations make information such as "ISBN" available through MAchine-Readable Cataloging aka MARC, you can find more information about it here as well.

Now knowing the "right" term to search for I discovered WorldCat.org.

Maybe this whole MARC thing gives you a new kind of an idea :)

akira
There are no reasonable open or paid but easy-to-use ways of resolving the issue using MARC records, as sites like WorldCat generally require that one is a library in order to access their search API. I've been surprised, because one would think that a public catalog of books would be easy to find!
Saketh
so you can't use the search api (http://worldcat.org/devnet/wiki/SearchAPIDetails) ?
akira
The WorldCat API uses an access key -- I have requested one, but if I could find an independent solution that would be great.
Saketh