libucto1-dev

Unicode tokenizer - development
  http://ilk.uvt.nl/
  0
  no reviews



Ucto can tokenize utf-8 encoded text files (i.e. separate words from punctuation, split sentences, generate n-grams), and offers several other basic preprocessing steps (change case, count words/characters and reverse lines) that make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation.

ucto is a product of the ilk research group, tilburg university (the netherlands).

this package provides the ucto header files required to compile c++ programs that use ucto.