Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: tokenizers
Anush008/tokenizers
Multi-arch bindings for @huggingface/tokenizers.
Language: Rust - Size: 893 KB - Last synced: 6 days ago - Pushed: 8 months ago - Stars: 4 - Forks: 1
jshuadvd/LongRoPE
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
Language: Python - Size: 34.5 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 50 - Forks: 8
s2458588/wsm-tokenizer
Bachelor Thesis Repository. Wsm-tokenizer (word shape mapping) uses vocabulary comparisons to find probable morphemes in lexemic tokens.
Language: Jupyter Notebook - Size: 2.41 MB - Last synced: 25 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
kojix2/blingfire-crystal
Language: Crystal - Size: 49.8 KB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0
helena-intel/test-prompt-generator
Create prompts with a given token length for testing LLMs and other transformers text models.
Language: Python - Size: 171 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
Prismadic/magnet
the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidly
Language: Python - Size: 8.82 MB - Last synced: 21 days ago - Pushed: about 2 months ago - Stars: 18 - Forks: 1
xebia-functional/xef
Building applications with LLMs through composability, in Kotlin, Scala, ...
Language: Kotlin - Size: 12.3 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 161 - Forks: 16
symanto-research/merge-tokenizers
Package to align tokens from different tokenizations.
Language: Python - Size: 347 KB - Last synced: 28 days ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
Hugging-Face-Supporter/tftokenizers
Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels
Language: Python - Size: 263 KB - Last synced: 3 months ago - Pushed: about 2 years ago - Stars: 5 - Forks: 2
DanielPFlorian/Transformers-Github-Semantic-Search
NLP Dataset Creation and Semantic Search Demonstration
Language: Jupyter Notebook - Size: 18.6 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
OmkarBorhade98/Text_Summarization
Text Summarization using NLP
Language: Jupyter Notebook - Size: 116 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
wenbingl/tfmtok
The tokenizer C/C++ library for transformers model
Language: C++ - Size: 8.61 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 1 - Forks: 0
megagonlabs/ginza-transformers
Use custom tokenizers in spacy-transformers
Language: Python - Size: 32.2 KB - Last synced: 4 months ago - Pushed: almost 2 years ago - Stars: 16 - Forks: 4
mickymultani/LLM-Architecture
Visualize some important concepts related to LLM architectures.
Language: Jupyter Notebook - Size: 9.74 MB - Last synced: 5 months ago - Pushed: 7 months ago - Stars: 1 - Forks: 0
victoryosiobe/kingchop
Kingchop ⚔️ is a JavaScript English based library for tokenizing text (chopping text). It uses vast rules for tokenizing, and you can adjust them easily.
Language: JavaScript - Size: 19.5 KB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 1 - Forks: 0
Beomi/megatronlm_dataset_autotokenizer
Megatron-LM/GPT-NeoX compatible Text Encoder with 🤗Transformers AutoTokenizer.
Language: Python - Size: 498 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 2 - Forks: 0
unfoldingWord/string-punctuation-tokenizer
Small library that provides functions to tokenize a string into an array of words with or without punctuation
Language: JavaScript - Size: 2.14 MB - Last synced: 6 days ago - Pushed: 10 months ago - Stars: 8 - Forks: 1
arturom/search-analysis
A graphical user interface for the Elasticsearch Analyze API
Language: JavaScript - Size: 4.67 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 5 - Forks: 0
sayakpaul/count-tokens-hf-datasets
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
Language: Python - Size: 19.5 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 15 - Forks: 1
jungsoh/transformers-question-answering
Fine tuning pre-trained transformer models in TensorFlow and in PyTorch for question answering
Language: Jupyter Notebook - Size: 379 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0
Matesxs/CodeTransformer
Language: Python - Size: 54.7 KB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0