ocrmypdf

add an OCR text layer to PDF files
  https://github.com/jbarlow83/OCRmyPDF
  0
  4 reviews



OCRmyPDF generates a searchable PDF/A file from a regular PDF containing only images, allowing it to be searched.

It uses the Tesseract OCR engine and so supports all the languages that Tesseract does.

Some other main features:

* Places OCR text accurately below the image to ease copy / paste * Keeps the exact resolution of the original embedded images * When possible, inserts OCR information as a lossless operation without rendering vector information * Keeps file size about the same * If requested deskews and/or cleans the image before performing OCR * Validates input and output files * Provides debug mode to enable easy verification of the OCR results * Processes pages in parallel when more than one CPU core is available * Battle-tested on thousands of PDFs, a test suite and continuous integration.
Latest reviews
1
zuzu 3 months ago

Not starting. Installed from software manager.

1
kezerd 7 months ago

Mint 21.2. Installed from software manager. Would not start. Not anywhere in menu.

5
SkidMark 1 year ago

Mint19.3 Worked amazingly well. Had a pdf with fine technical details that was stored as an image with encryption. Printed to pdf to remove encryption, then used this to OCR to a new file to make the text searchable. Worked like a champ!

5
advolex 2 years ago

I needed this app for the german lawyer electronic postal service (beA) and it worked PERFECTLY with tesseract, which I installed before. Really a great relief!!