tokenizers (0.1.4)

0 users

A Consistent Interface to Tokenize Natural Language Text.

Convert natural language text into tokens. The tokenizers have a consistent interface and are compatible with Unicode, thanks to being built on the 'stringi' package. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, lines, and regular expressions.

Maintainer: Lincoln Mullen
Author(s): Lincoln Mullen [aut, cre], Dmitriy Selivanov [ctb]

License: MIT + file LICENSE

Uses: Rcpp, SnowballC, stringi, testthat, knitr, rmarkdown, covr
Reverse suggests: cleanNLP

Released over 1 year ago.

4 previous versions



  (0 votes)


  (0 votes)

Log in to vote.


No one has written a review of tokenizers yet. Want to be the first? Write one now.

Related packages: corpora, gsubfn, kernlab, languageR, lsa, tm, wordnet, zipfR, RWeka, RKEA, openNLP, skmeans, tau, tm.plugin.mail, lda, textcat, topicmodels, tm.plugin.dc, textir, movMF(20 best matches, based on common tags.)

Search for tokenizers on google, google scholar, r-help, r-devel.

Visit tokenizers on R Graphical Manual.