Package: piecemaker 1.0.2.9000

Jon Harmon

piecemaker: Tools for Preparing Text for Tokenizers

Tokenizers break text into pieces that are more usable by machine learning models. Many tokenizers share some preparation steps. This package provides those shared steps, along with a simple tokenizer.

Authors:Jon Harmon [aut, cre], Jonathan Bratt [aut], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]

piecemaker_1.0.2.9000.tar.gz
piecemaker_1.0.2.9000.zip(r-4.5)piecemaker_1.0.2.9000.zip(r-4.4)piecemaker_1.0.2.9000.zip(r-4.3)
piecemaker_1.0.2.9000.tgz(r-4.4-any)piecemaker_1.0.2.9000.tgz(r-4.3-any)
piecemaker_1.0.2.9000.tar.gz(r-4.5-noble)piecemaker_1.0.2.9000.tar.gz(r-4.4-noble)
piecemaker_1.0.2.9000.tgz(r-4.4-emscripten)piecemaker_1.0.2.9000.tgz(r-4.3-emscripten)
piecemaker.pdf |piecemaker.html
piecemaker/json (API)
NEWS

# Install 'piecemaker' in R:
install.packages('piecemaker', repos = c('https://macmillancontentscience.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/macmillancontentscience/piecemaker/issues

On CRAN:

10 exports 1.26 score 8 dependencies 2 dependents 6 scripts 287 downloads

Last updated 1 years agofrom:b02c1a7492. Checks:OK: 7. Indexed: yes.

TargetResultDate
Doc / VignettesOKAug 26 2024
R-4.5-winOKAug 26 2024
R-4.5-linuxOKAug 26 2024
R-4.4-winOKAug 26 2024
R-4.4-macOKAug 26 2024
R-4.3-winOKAug 26 2024
R-4.3-macOKAug 26 2024

Exports:prepare_and_tokenizeprepare_textremove_control_charactersremove_diacriticsremove_replacement_charactersspace_cjkspace_punctuationsquish_whitespacetokenize_spacevalidate_utf8

Dependencies:cligluelifecyclemagrittrrlangstringistringrvctrs