An open API service providing repository metadata for many open source software ecosystems.

GitHub / Systemcluster / kitoken

Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Systemcluster%2Fkitoken
PURL: pkg:github/Systemcluster/kitoken

Stars: 27
Forks: 0
Open issues: 0

License: bsd-2-clause
Language: Rust
Size: 27.3 MB
Dependencies parsed at: Pending

Created at: about 2 years ago
Updated at: 14 days ago
Pushed at: 4 months ago
Last synced at: 12 days ago

Topics: bpe, nlp, nodejs, python, rust, sentencepiece, tokenizer, unigram, web, word-segmentation

Funding Links https://github.com/sponsors/Systemcluster

    Loading...