Linux Mint - Community

Apache tika - content analysis toolkit

http://tika.apache.org
0
no reviews

The apache tika toolkit detects and extracts metadata and text content from various documents (ppt, csv, pdf, mp3, html and more) using existing parser libraries. tika unifies these parsers under a single interface to allow you to easily parse over a thousand different file types. tika is useful for search engine indexing, content analysis, translation, and much more.

libtika-java

Apache tika - content analysis toolkit