| Title: | Data for Morpheme Tokenization |
|---|---|
| Description: | Provides data about morphemes, the smallest units of meaning in a language. |
| Authors: | Jonathan Bratt [aut] (ORCID: <https://orcid.org/0000-0003-2859-0076>), Jon Harmon [aut, cre] (ORCID: <https://orcid.org/0000-0003-4781-4346>), Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph] |
| Maintainer: | Jon Harmon <[email protected]> |
| License: | Apache License (>= 2) |
| Version: | 1.2.0 |
| Built: | 2026-05-21 08:04:14 UTC |
| Source: | https://github.com/macmillancontentscience/morphemepiece.data |
A morphemepiece lookup is a named character vector. The names of the vector are the words, and the values are the space-separated morpheme breakdowns of those words.
morphemepiece_lookup()morphemepiece_lookup()
A named character vector.
head(morphemepiece_lookup())head(morphemepiece_lookup())
A morphemepiece vocabulary is a named integer vector with class "morphemepiece_vocabulary". The names of the vector are the morphemes, and the values are the integer identifiers of those tokens. The vocabulary is 0-indexed for compatibility with Python implementations.
morphemepiece_vocab()morphemepiece_vocab()
A morphemepiece_vocabulary.
head(morphemepiece_vocab())head(morphemepiece_vocab())