GitHub / DolbyUUU / byte_pair_encoding_BPE_subword_tokenization_implementation_python
Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DolbyUUU%2Fbyte_pair_encoding_BPE_subword_tokenization_implementation_python
PURL: pkg:github/DolbyUUU/byte_pair_encoding_BPE_subword_tokenization_implementation_python
Stars: 12
Forks: 0
Open issues: 0
License: None
Language: Python
Size: 449 KB
Dependencies parsed at: Pending
Created at: over 2 years ago
Updated at: 7 months ago
Pushed at: over 2 years ago
Last synced at: 6 months ago
Topics: bpe, byte-pair-encoding, data-cleaning, data-preprocessing, natural-language-processing, nlp, python, subword-tokenization, tokenizer