tesseract-ocr

Tesseract command line OCR tool
  https://github.com/tesseract-ocr/
  21
  16 reviews



Tesseract is an open source Optical Character Recognition (OCR) Engine. It can be used directly, or (for programmers) using an API to extract printed text from images. It supports a wide variety of languages. This package includes the command line tool.
Latest reviews
1
n0body_special 2 years ago

tesseract --list-langs returns only english, with absolutely no direction on where to find and how to install other languages. Generally this is only for programmers. If you are an ordinary person and you think you are going to give a simple command with a source file and a selected language and get results, this ain't gonna happen.

5
advolex 3 years ago

For the german lawyer postal service (beA) I needed that function, but it didn't seem to work with PDF as a basis. Then I installed OCRmyPDF and with both programs together, you're good to go! Any PDF will be seachable! Great!

5
Starkiller_007 4 years ago

Amazing probably the most accurate ocr tested

4
2bfrank 4 years ago

Very helpful if you need to ocr a document quickly without installing a lot of software! A few minor corrections to the text need to be made.

5
Knezev87 6 years ago

Odličan

4
ArnaudDorthe 8 years ago

Could someone kindly update it to the version 4 please ?

5
publicFriend 10 years ago

Great!

5
reddot 10 years ago

wow !!! converted a jpg scan to text, saved me allot of time!!!

5
observativetiger 10 years ago

Works great with gscan2pdf.

5
Diesel_F 11 years ago

Justo lo que necesitaba. (topper, gracias por tu reseña tan informativa)

4
topper 11 years ago

Es un buen programa. Si lo instalas, recuerda que tienes que añadir también el archivo de idioma correspondiente (tesseract-ocr-spa --> para español; o los que necesites). Ten en cuenta que se trabaja desde la consola con archivos .tif [p. ej.: tesseract archivo.tif archivo-resultante -l spa]. Si quieres algo más visual, tienes, además, que instalar un programa que se llama YAGF (lo hallarás en los repositorios). De esta manera podrás trabajar en un entorno gráfico con selección de áreas para realizar el OCR, etc. Para terminar de completarlo, puedes añadir otro programa llamado CUNEIFORM.

2
juandiego 11 years ago

No lee bien el texto no linial y debería ser multilenguaje.

5
pbojan 11 years ago

Great OCR engine

5
ulysses 13 years ago

Excellent!

5
RevDieter 14 years ago

The tesseract engine provides recognition of German Fraktur script - and to my (limited) knowledge, no other free ocr engine provides that. I think that's awesome!

4
Alexio 15 years ago

One of the most accurate free software OCR engines that handles image files in TIFF format (with filename extension .tif); other file formats need to be converted to TIFF before being submitted to Tesseract.