tokenizers (0.1.3)

A Consistent Interface to Tokenize Natural Language Text.

Convert natural language text into tokens. The tokenizers have a consistent interface and are compatible with Unicode, thanks to being built on the 'stringi' package. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, lines, and regular expressions.

Maintainer: Lincoln Mullen
Author(s): Lincoln Mullen [aut, cre], Dmitriy Selivanov [ctb]

License: MIT + file LICENSE

Uses: Rcpp, SnowballC, stringi, testthat, knitr, rmarkdown, covr
Reverse suggests: cleanNLP, edgarWebR, text2vec

Released about 3 years ago.