Package: wordpiece 2.1.3

Jonathan Bratt

wordpiece: R Implementation of Wordpiece Tokenization

Apply 'Wordpiece' (<arxiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arxiv:1810.04805>) tokenization conventions are used by default.

Authors:Jonathan Bratt [aut, cre], Jon Harmon [aut], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]

wordpiece_2.1.3.tar.gz
wordpiece_2.1.3.zip(r-4.5)wordpiece_2.1.3.zip(r-4.4)wordpiece_2.1.3.zip(r-4.3)
wordpiece_2.1.3.tgz(r-4.4-any)wordpiece_2.1.3.tgz(r-4.3-any)
wordpiece_2.1.3.tar.gz(r-4.5-noble)wordpiece_2.1.3.tar.gz(r-4.4-noble)
wordpiece_2.1.3.tgz(r-4.4-emscripten)wordpiece_2.1.3.tgz(r-4.3-emscripten)
wordpiece.pdf |wordpiece.html
wordpiece/json (API)
NEWS

# Install 'wordpiece' in R:
install.packages('wordpiece', repos = c('https://macmillancontentscience.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/macmillancontentscience/wordpiece/issues

On CRAN:

7 exports 8 stars 1.40 score 18 dependencies 7 scripts 218 downloads

Last updated 3 years agofrom:3eb92c7595. Checks:OK: 3 NOTE: 4. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 03 2024
R-4.5-winNOTESep 03 2024
R-4.5-linuxNOTESep 03 2024
R-4.4-winNOTESep 03 2024
R-4.4-macNOTESep 03 2024
R-4.3-winOKSep 03 2024
R-4.3-macOKSep 03 2024

Exports:load_or_retrieve_vocabload_vocabprepare_vocabset_wordpiece_cache_dirwordpiece_cache_dirwordpiece_tokenizewordpiece_vocab

Dependencies:cachemclidigestdlrfastmapfastmatchfsgluelifecyclemagrittrmemoisepiecemakerrappdirsrlangstringistringrvctrswordpiece.data

Using wordpiece

Rendered frombasic_usage.Rmdusingknitr::rmarkdownon Sep 03 2024.

Last update: 2021-09-27
Started: 2021-01-12