
The apache tika toolkit detects and extracts metadata and text content from various documents (ppt, csv, pdf, mp3, html and more) using existing parser libraries. tika unifies these parsers under a single interface to allow you to easily parse over a thousand different file types. tika is useful for search engine indexing, content analysis, translation, and much more.