libucto1

Unicode tokenizer - runtime
  http://ilk.uvt.nl/
  0
  no reviews



Ucto can tokenize utf-8 encoded text files (i.e. separate words from punctuation, split sentences, generate n-grams), and offers several other basic preprocessing steps (change case, count words/characters and reverse lines) that make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation.

ucto is a product of the ilk research group, tilburg university (the netherlands).

this package provides the runtime files required to run programs that use ucto.