boilerpipeR (1.3)

0 users

Interface to the Boilerpipe Java Library.

https://github.com/mannau/boilerpipeR
http://cran.r-project.org/web/packages/boilerpipeR

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe (http://code.google.com/p/boilerpipe/) Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

Maintainer: Mario Annau
Author(s): See AUTHORS file.

License: Apache License (== 2.0)

Uses: rJava, RCurl
Reverse depends: tm.plugin.webmining

Released about 2 years ago.


4 previous versions

Ratings

Overall:

  (0 votes)

Documentation:

  (0 votes)

Log in to vote.

Reviews

No one has written a review of boilerpipeR yet. Want to be the first? Write one now.


Related packages: tm.plugin.webmining, mscstexta4r, mscsweblm4r, datarobot, europepmc, tau, RTextTools, RcmdrPlugin.temis, maxent, qdap, KoNLP, textir, stringi, SnowballC, movMF, lda, languageR, koRpus, skmeans, corpora(20 best matches, based on common tags.)


Search for boilerpipeR on google, google scholar, r-help, r-devel.

Visit boilerpipeR on R Graphical Manual.