As I recently bought Sony PRS-600 (fairly good reader with nice touch screen, in case you read Polish see my review), I become interested in ebook management. For Linux user it looks like the only reasonable option is to use Calibre - useful application which not only lets me manage my reader, but also provides well designed ebook database.
One of the nice Calibre options is that once you enter a book ISBN, plenty of useful information (canonical versions of author name and book title, description, cover, even tags) can be downloaded automatically. But, for some reason, the application does not detect ISBN. I repeated the sequence open a book, go a page or a few down, copy ISBN, go back to Calibre, open book data, paste ISBN a few times and decided it is boring and could be automated.
So I wrote a short script which performs this very action.
Purpose
The script is analysing calibre database (it assumes calibre is already installed and properly configured), looking for books without ISBN, then tries to find their ISBN by scanning leading pages. If ISBN is found, the script saves it (updates given book Calibre metatada). No other metadata changes are performed.
Later on ISBN can be used to grab the book metatada and/or book cover inside Calibre GUI. Just spawn Calibre and look for books with ISBN set and missing metadata, for example using query like:
isbn:~[0-9] not publisher:~[a-z]
(above means: isbn contains some digit, publisher does not contain any letter). Then mark appropriate books (I prefer to handle them in batches of no more than 10-20 so I can review the changes easily), right click, expand Edit Medatada Information submenu and pick Download Metadata (or some other Download option).
Prerequisities
The script has been developed and used on Ubuntu Linux. It should work on other platforms (if necessary tools are installed), including Windows and Mac, but I haven't tested it.
Calibre must be installed, properly configured and have
some books in the database (otherwise it does not make sense to run the script).
The calibredb
command must be in PATH
(alternatively CALIBREDB
variable on the beginning
of the script can be modified
to contain full path to calibredb
).
Tools providing the following commands:
pdftotext
catdoc
djvutxt
must be installed and present in PATH
. On Ubuntu Linux or Debian Linux
those can be installed from standard repositories, just install the
following packages: poppler-utils
, catdoc
, djvulibre-bin
- either using
GUI, or by running
$ sudo apt-get install poppler-utils catdoc djvulibre-bin
Python 2.6 is required (script is using features of tempfile and
subprocess introduced in 2.6). Also, lxml library must be installed.
On Debian or Ubuntu just install the following packages: python2.6
and python-lxml
, for example by:
$ sudo apt-get install python2.6 python-lxml
Download and Installation
The script is available here (to download just click raw and save the file as guess_and_add_isbn.py
in any folder of your choice).
Usage
Spawn terminal or console, check whether PATH is set properly, then run:
$ python guess_and_add_isbn.py
and wait for the script to finish.
Note: it may take some time, especially on bigger databases.
The script can be run while Calibre is running (it will notify running Calibre about data changes). There is minor annoyance in such case (every time some book is updated, Calibre refreshes the book list and forgets which books were selected), so I do not recommend searching or editing books while the script is running.
The script can be safely re-run again (for example after new books are added).
Source code
Official repository: http://bitbucket.org/Mekk/calibre_utils