• Tesseract OCR
    Programming

    Adding OCR to a multipage pdf

    PDF of images Nothing is more frustrating than a PDF book/course existing of scanned images. It’s impossible to search the text or to copy parts of the texts. Luckily there is something called OCR (Optical Character Recognition). With this technology it is possible to extract text out of images. The most popular application to do this in Linux is tesseract. But looking at its man page, we immediately see 1 big problem… it does not work on PDF files! Get rid of the PDF format So the first thing we have to do is to dispose of the pdf format and adopt the tiff format. This is very simply done…

  • Book-binging script
    Linux,  Programming

    Bash for Book-binding

    Book-binding using bash 3 Years ago I decided I wanted to make a big change in my live. I wanted to do what I always seamed to do best, work with computers, more specifically, I wanted to learn more about Linux and programming. There was only one problem, I hate reading long texts on a screen, leave alone entire books… So first thing I did was write me some scripts to transform pdf files that way it could be printed to bind it later as a real book the book-binding way . In this post I will evaluate my script for people who are interested in it to do similar…