An open API service providing repository metadata for many open source software ecosystems.

Topic: "wordpiece"

NLPOptimize/flash-tokenizer

EFFICIENT AND OPTIMIZED TOKENIZER ENGINE FOR LLM INFERENCE SERVING

Language: C++ - Size: 197 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 456 - Forks: 5

georg-jung/FastBertTokenizer

Fast and memory-efficient library for WordPiece tokenization as it is used by BERT.

Language: C# - Size: 19.2 MB - Last synced at: 4 days ago - Pushed at: 24 days ago - Stars: 49 - Forks: 11

stephantul/piecelearn

Learning BPE embeddings by first learning a segmentation model and then training word2vec

Language: Python - Size: 22.5 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 19 - Forks: 1

NLPOptimize/awesome-tokenizers

A curated list of tokenizer libraries for blazing-fast NLP processing.

Size: 18.6 KB - Last synced at: 25 days ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

danieldk/wordpieces

Split tokens into word pieces

Language: Rust - Size: 33.2 KB - Last synced at: 8 days ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 0

SeonbeomKim/Python-Byte_Pair_Encoding

Byte Pair Encoding (BPE)

Language: Python - Size: 51.2 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 4

Lizhecheng02/Kaggle-LLM-Detect_AI_Generated_Text

Detect whether the text is AI-generated by training a new tokenizer and combining it with tree classification models or by training language models on a large dataset of human & AI-generated texts.

Language: Jupyter Notebook - Size: 394 KB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

SeanLee97/BertWordPieceTokenizer.jl

WordPiece Tokenizer for BERT models.

Language: Julia - Size: 23.4 KB - Last synced at: 24 days ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

Daniel-Heo/NemoTokenizer

Fast wordpiece, sentencepiece tokenizer by Trie, OpenMP, SIMD, MemoryPool

Language: C++ - Size: 231 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

Hank-Kuo/go-bert-tokenizer

go-bert-tokenizer

Language: Go - Size: 119 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

burcgokden/BERT-Subword-Tokenizer-Wrapper

A framework for generating subword vocabulary from a tensorflow dataset and building custom BERT tokenizer models.

Language: Python - Size: 12.7 KB - Last synced at: 5 months ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

vassef/Implementing-BPE-and-WordPiece-Tokenizers

Language: Jupyter Notebook - Size: 1.32 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0