Package: wordpiece.data 2.0.0

Jon Harmon

wordpiece.data: Data for Wordpiece-Style Tokenization

Provides data to be used by the wordpiece algorithm in order to tokenize text into somewhat meaningful chunks. Included vocabularies were retrieved from <https://huggingface.co/bert-base-cased/resolve/main/vocab.txt> and <https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt> and parsed into an R-friendly format.

Authors:Jonathan Bratt [aut], Jon Harmon [aut, cre], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph], Google, Inc [cph]

wordpiece.data_2.0.0.tar.gz
wordpiece.data_2.0.0.zip(r-4.5)wordpiece.data_2.0.0.zip(r-4.4)wordpiece.data_2.0.0.zip(r-4.3)
wordpiece.data_2.0.0.tgz(r-4.4-any)wordpiece.data_2.0.0.tgz(r-4.3-any)
wordpiece.data_2.0.0.tar.gz(r-4.5-noble)wordpiece.data_2.0.0.tar.gz(r-4.4-noble)
wordpiece.data_2.0.0.tgz(r-4.4-emscripten)wordpiece.data_2.0.0.tgz(r-4.3-emscripten)
wordpiece.data.pdf |wordpiece.data.html
wordpiece.data/json (API)
NEWS

# Install 'wordpiece.data' in R:
install.packages('wordpiece.data', repos = c('https://macmillancontentscience.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/macmillancontentscience/wordpiece.data/issues

On CRAN:

1 exports 0.93 score 0 dependencies 1 dependents 5 scripts 224 downloads

Last updated 3 years agofrom:f893df5061. Checks:OK: 7. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 03 2024
R-4.5-winOKSep 03 2024
R-4.5-linuxOKSep 03 2024
R-4.4-winOKSep 03 2024
R-4.4-macOKSep 03 2024
R-4.3-winOKSep 03 2024
R-4.3-macOKSep 03 2024

Exports:wordpiece_vocab

Dependencies: