The project followed these steps:
- take a copy of the book and guillotine off the spine
- scan: feed the book into an office multi function scanner via ADF, set for double sided scanning, 300dpi, TIF, b/w optimized for text, A5; the machine processed them in under 1 hour in several batches
- pages with photos were individually scanned as color JPG; this was slightly more fiddly then the above
- photoshop photo pages; crop, optimize contrast, clean up, save as greyscale or color
- OCR: I used Tesseract from Google; once I got the process down pat I made a custom shell script which contained a tesseract command for each page/TIF, times the number of files to be processed; this took maybe half hour on my slow PC
- assemble the text files and photos into a document; this process took about one morning, not counting fiddling
- edit: make the new text presentable, insert an automated table of contents with the use of headings, correct OCR errors, change indents and quotes for consistency; this took about 2 weeks part time
- save as pdf; I used OpenOffice Writer for the above step, which also allows the pdf conversion to dial up or down the photo compression and turn a native 20Mb file into 3.5Mb
Incidentally, this project started after I started reading my autographed copy of the book.
Green Fire is a first hand account into activism and the Australian green movement which now spans decades. The chapter about The Politics of Poo was especially humorous. I am a total noob to all of it for reasons outside of the scope of this blog entry, but the book is probably a must read for any Australians interested in protest actions.
Download the book via iancohennsw.blogspot.com