Topic: "tokenizer"
theseer/tokenizer
A small library for converting tokenized PHP source code into XML (and potentially other formats)
Language: PHP - Size: 83 KB - Last synced at: 6 months ago - Pushed at: almost 2 years ago - Stars: 5,198 - Forks: 22
Chevrotain/chevrotain
Parser Building Toolkit for JavaScript
Language: TypeScript - Size: 36.9 MB - Last synced at: 12 days ago - Pushed at: 14 days ago - Stars: 2,711 - Forks: 216
roshan-research/hazm
Persian NLP Toolkit
Language: Python - Size: 25.2 MB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 1,357 - Forks: 204
natasha/natasha
Solves basic Russian NLP tasks, API for lower level Natasha projects
Language: Python - Size: 35.7 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 1,289 - Forks: 110
dqbd/tiktokenizer
Online playground for OpenAPI tokenizers
Language: TypeScript - Size: 713 KB - Last synced at: 7 months ago - Pushed at: 8 months ago - Stars: 1,165 - Forks: 134
lovit/soynlp
한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
Language: Python - Size: 34.1 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 983 - Forks: 183
ikawaha/kagome
Self-contained Japanese Morphological Analyzer written in pure Go
Language: Go - Size: 711 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 917 - Forks: 55
no-context/moo
Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
Language: JavaScript - Size: 770 KB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 872 - Forks: 72
BLKSerene/Wordless
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Language: Python - Size: 75.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 741 - Forks: 96
niieani/gpt-tokenizer
The fastest JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT models (gpt-5, gpt-o*, gpt-4o, etc.). Port of OpenAI's tiktoken with additional features.
Language: TypeScript - Size: 12.9 MB - Last synced at: 21 days ago - Pushed at: about 2 months ago - Stars: 704 - Forks: 51
wangfenjin/simple
支持中文和拼音的 SQLite fts5 全文搜索扩展 | A SQLite3 fts5 tokenizer which supports Chinese and PinYin
Language: C++ - Size: 969 KB - Last synced at: 7 months ago - Pushed at: 8 months ago - Stars: 691 - Forks: 99
mathewsanders/Mustard
🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.
Language: Swift - Size: 137 KB - Last synced at: 5 months ago - Pushed at: over 7 years ago - Stars: 687 - Forks: 18
risesoft-y9/Data-Labeling
数据标注是一款专门对文本数据进行处理和标注的工具,通过简化快捷的文本标注流程和动态的算法反馈,支持用户快速标注关键词并能通过算法持续减少人工标注的成本和时间。数据标注的过程先由人工标注构建基础,再由自动标注反哺人工标注,最后由人工标注进行纠偏,从而大幅度提高标注的精准度和高效性。数据标注需要依赖开源的数字底座进行人员岗位管控。
Language: Java - Size: 1.79 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 676 - Forks: 96
cbaziotis/ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Language: Python - Size: 778 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 671 - Forks: 93
open-korean-text/open-korean-text
Open Korean Text Processor - An Open-source Korean Text Processor
Language: Scala - Size: 32.7 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 646 - Forks: 97
smoothnlp/SmoothNLP 📦
专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference
Language: Java - Size: 6.71 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 620 - Forks: 112
jflex-de/jflex
The fast scanner generator for Java™ with full Unicode support
Language: Java - Size: 22.1 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 618 - Forks: 119
alasdairforsythe/tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Language: Go - Size: 734 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 600 - Forks: 20
lindera/lindera
A multilingual morphological analysis library.
Language: Rust - Size: 179 MB - Last synced at: 2 days ago - Pushed at: 4 days ago - Stars: 556 - Forks: 51
glayzzle/php-parser
:herb: NodeJS PHP Parser - extract AST or tokens
Language: JavaScript - Size: 29.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 547 - Forks: 73
lydell/js-tokens
Tiny JavaScript tokenizer.
Language: JavaScript - Size: 733 KB - Last synced at: 8 days ago - Pushed at: 26 days ago - Stars: 542 - Forks: 39
lionsoul2014/friso
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
Language: C - Size: 3.07 MB - Last synced at: 7 months ago - Pushed at: about 2 years ago - Stars: 504 - Forks: 92
hplt-project/sacremoses
Python port of Moses tokenizer, truecaser and normalizer
Language: Python - Size: 724 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 495 - Forks: 60
FoundationVision/UniTok
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
Language: Python - Size: 32.1 MB - Last synced at: 1 day ago - Pushed at: about 2 months ago - Stars: 494 - Forks: 10
leodevbro/vscode-blockman
VSCode extension to highlight nested code blocks
Language: TypeScript - Size: 66.5 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 488 - Forks: 19
polm/fugashi
A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Language: C++ - Size: 489 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 487 - Forks: 39
CogComp/cogcomp-nlp
CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
Language: Java - Size: 85.5 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 479 - Forks: 144
neurosnap/sentences
A multilingual command line sentence tokenizer in Golang
Language: Go - Size: 15.3 MB - Last synced at: 14 days ago - Pushed at: almost 2 years ago - Stars: 461 - Forks: 41
NLPOptimize/flash-tokenizer
EFFICIENT AND OPTIMIZED TOKENIZER ENGINE FOR LLM INFERENCE SERVING
Language: C++ - Size: 197 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 459 - Forks: 7
timtadh/lexmachine
Lex machinary for go.
Language: Go - Size: 296 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 412 - Forks: 28
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
Language: Python - Size: 39.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 408 - Forks: 23
ku-nlp/jumanpp
Juman++ (a Morphological Analyzer Toolkit)
Language: C++ - Size: 3.78 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 397 - Forks: 45
daac-tools/vibrato
🎤 vibrato: Viterbi-based accelerated tokenizer
Language: Rust - Size: 1.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 384 - Forks: 21
belladoreai/llama-tokenizer-js
JS tokenizer for LLaMA 1 and 2
Language: JavaScript - Size: 3.07 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 361 - Forks: 24
zurawiki/tiktoken-rs
Ready-made tokenizer library for working with GPT and tiktoken
Language: Rust - Size: 3.71 MB - Last synced at: 23 days ago - Pushed at: about 2 months ago - Stars: 353 - Forks: 65
OpenNMT/Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Language: C++ - Size: 1.74 MB - Last synced at: about 11 hours ago - Pushed at: 3 days ago - Stars: 327 - Forks: 78
guillaume-be/rust-tokenizers
Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models
Language: Rust - Size: 1.12 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 326 - Forks: 31
artitw/text2text
Text2Text Language Modeling Toolkit
Language: Python - Size: 870 KB - Last synced at: 1 day ago - Pushed at: 12 months ago - Stars: 305 - Forks: 40
sugarme/tokenizer
NLP tokenizers written in Go language
Language: Go - Size: 1.49 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 291 - Forks: 56
bitextor/bitextor
Bitextor generates translation memories from multilingual websites
Language: Python - Size: 177 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 290 - Forks: 43
tlaceby/guide-to-interpreters-series
Contains source-code for viewers following along with my Beginners Guide To Building Interpreters series on my Youtube Channel.
Language: TypeScript - Size: 65.4 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 257 - Forks: 34
mediacloud/sentence-splitter 📦
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
Language: Python - Size: 45.9 KB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 256 - Forks: 33
dmitry-brazhenko/SharpToken
SharpToken is a C# library for tokenizing natural language text. It's based on the tiktoken Python library and designed to be fast and accurate.
Language: C# - Size: 3.62 MB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 250 - Forks: 17
daac-tools/vaporetto
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
Language: Rust - Size: 4 MB - Last synced at: 13 days ago - Pushed at: 18 days ago - Stars: 249 - Forks: 10
bnosac/udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Language: C++ - Size: 5.11 MB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 218 - Forks: 34
zhenye234/xcodec
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Language: Python - Size: 1.77 MB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 209 - Forks: 13
fnl/syntok
Text tokenization and sentence segmentation (segtok v2)
Language: Python - Size: 203 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 207 - Forks: 35
microsoft/Tokenizer
Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs.
Language: C# - Size: 1.98 MB - Last synced at: 13 days ago - Pushed at: 8 months ago - Stars: 206 - Forks: 35
netgen/query-translator
Query Translator is a search query translator with AST representation
Language: PHP - Size: 506 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 206 - Forks: 11
Dadmatech/DadmaTools
DadmaTools is a Persian NLP tools developed by Dadmatech Co.
Language: Python - Size: 92.6 MB - Last synced at: 28 days ago - Pushed at: 5 months ago - Stars: 203 - Forks: 45
mck89/peast
JavaScript parser written in PHP that generates AST from your code according to ECMAScript specification
Language: PHP - Size: 1.75 MB - Last synced at: 16 days ago - Pushed at: 3 months ago - Stars: 187 - Forks: 23
ropensci/tokenizers
Fast, Consistent Tokenization of Natural Language Text
Language: R - Size: 1.24 MB - Last synced at: 25 days ago - Pushed at: almost 2 years ago - Stars: 186 - Forks: 24
garvys-org/rustfst
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Language: Rust - Size: 7.59 MB - Last synced at: 20 days ago - Pushed at: 6 months ago - Stars: 177 - Forks: 19
adbar/simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Language: Python - Size: 729 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 175 - Forks: 15
botisan-ai/gpt3-tokenizer
Isomorphic JavaScript/TypeScript Tokenizer for GPT-3 and Codex Models by OpenAI.
Language: TypeScript - Size: 2.06 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 171 - Forks: 16
xinjli/transphone
phoneme tokenizer and grapheme-to-phoneme model for 8k languages
Language: Python - Size: 342 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 170 - Forks: 17
untitaker/html5gum
A WHATWG-compliant HTML5 tokenizer and tag soup parser
Language: Rust - Size: 547 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 165 - Forks: 11
gautierdag/bpeasy
Fast bare-bones BPE for modern tokenizer training
Language: Python - Size: 1.41 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 164 - Forks: 5
howl-anderson/MicroTokenizer
一个轻量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..
Language: Python - Size: 174 MB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 153 - Forks: 22
tsproisl/SoMaJo
A tokenizer and sentence splitter for German and English web and social media texts.
Language: Python - Size: 1.35 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 148 - Forks: 22
nette/tokenizer 📦
[DISCONTINUED] Source code tokenizer
Language: PHP - Size: 104 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 141 - Forks: 23
LorettaDevs/Loretta
A C# Lua, GLua and Luau parser, code analysis, transformation and generation library.
Language: C# - Size: 10.7 MB - Last synced at: 26 days ago - Pushed at: 29 days ago - Stars: 138 - Forks: 11
foonathan/lex 📦
Replaced by foonathan/lexy
Language: C++ - Size: 308 KB - Last synced at: 8 months ago - Pushed at: about 5 years ago - Stars: 138 - Forks: 8
Kensuke-Mitsuzawa/JapaneseTokenizers
aim to use JapaneseTokenizer as easy as possible
Language: Python - Size: 271 KB - Last synced at: about 2 months ago - Pushed at: almost 7 years ago - Stars: 138 - Forks: 21
MagedSaeed/farasapy
A Python implementation of Farasa toolkit
Language: Python - Size: 265 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 136 - Forks: 23
wjf5203/TokBench
Image and video Tokenizer/VAE selection guide, text and face reconstruction evaluation.
Language: Python - Size: 46.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 133 - Forks: 0
mykolaharmash/works-for-me
Collection of developer toolkits
Language: JavaScript - Size: 14.8 MB - Last synced at: 5 months ago - Pushed at: over 7 years ago - Stars: 129 - Forks: 7
GerHobbelt/jison Fork of zaach/jison
bison / YACC / LEX in JavaScript (LALR(1), SLR(1), etc. lexer/parser generator)
Language: JavaScript - Size: 32.2 MB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 125 - Forks: 21
bzick/tokenizer
Tokenizer (lexer) for golang
Language: Go - Size: 119 KB - Last synced at: 8 months ago - Pushed at: 11 months ago - Stars: 124 - Forks: 8
Cledev-Limited/Cledev.OpenAI
.NET 7 SDK for OpenAI with a Blazor Server playground
Language: C# - Size: 511 KB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 124 - Forks: 21
kyegomez/MambaByte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
Language: Python - Size: 2.18 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 123 - Forks: 9
bytexenon/Tiny-Lua-Compiler
⛄Possibly the smallest Lua compiler ever
Language: Lua - Size: 479 KB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 120 - Forks: 7
kakaobrain/kortok
The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)
Language: Python - Size: 5.6 MB - Last synced at: 8 months ago - Pushed at: about 5 years ago - Stars: 118 - Forks: 10
belladoreai/llama3-tokenizer-js
JS tokenizer for LLaMA 3 and LLaMA 3.1
Language: JavaScript - Size: 7.22 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 117 - Forks: 6
ropensci/hunspell
High-Performance Stemmer, Tokenizer, and Spell Checker for R
Language: C++ - Size: 4.45 MB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 113 - Forks: 46
clipperhouse/jargon
Tokenizers and lemmatizers for Go
Language: Go - Size: 1.1 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 110 - Forks: 3
bevacqua/megamark
:heart_eyes_cat: Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer
Language: JavaScript - Size: 2.28 MB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 107 - Forks: 7
togatoga/kanpyo
Japanese Morphological Analyzer written in Rust
Language: Rust - Size: 10.4 MB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 106 - Forks: 1
AmrDeveloper/FileQL
A tool that allow you to run SQL-like query on local files instead of database files using the GitQL SDK.
Language: Rust - Size: 822 KB - Last synced at: 25 days ago - Pushed at: 4 months ago - Stars: 105 - Forks: 3
JuliaLang/Tokenize.jl
Tokenization for Julia source code
Language: Julia - Size: 472 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 104 - Forks: 28
chriskonnertz/string-calc
PHP calculator library for mathematical terms (expressions) passed as strings
Language: PHP - Size: 307 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 102 - Forks: 19
explosion/spacy-experimental
🧪 Cutting-edge experimental spaCy components and features
Language: Python - Size: 1.33 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 101 - Forks: 20
dluc/openai-tools
A collection of tools for working with OpenAI
Language: C# - Size: 559 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 100 - Forks: 15
johannschopplich/tokenx
📐 Fast token estimation at 94% accuracy of a full tokenizer in a 2kB bundle
Language: TypeScript - Size: 658 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 99 - Forks: 5
yishn/chinese-tokenizer
Tokenizes Chinese texts into words.
Language: JavaScript - Size: 11.2 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 99 - Forks: 25
nooscraft/tokuin
CLI tool – estimates LLM tokens/costs and runs provider-aware load tests for OpenAI, Anthropic, OpenRouter, or custom endpoints.
Language: Rust - Size: 438 KB - Last synced at: 28 days ago - Pushed at: about 1 month ago - Stars: 97 - Forks: 3
sefineh-ai/Amharic-Tokenizer
Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast, accurate, trainable.
Language: Python - Size: 10.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 95 - Forks: 12
Voine/Bert-VITS2-MNN
TTS System Bert-VITS2 Android Ver, powered by alibaba-MNN engine.
Language: Kotlin - Size: 38.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 94 - Forks: 9
clipperhouse/uax29
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split graphemes, words, sentences.
Language: Go - Size: 920 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 92 - Forks: 4
alfianlosari/GPTEncoder
Swift BPE Encoder/Decoder for OpenAI GPT Models. A programmatic interface for tokenizing text for OpenAI ChatGPT API.
Language: Swift - Size: 554 KB - Last synced at: 10 days ago - Pushed at: almost 3 years ago - Stars: 87 - Forks: 20
colindembovsky/cols-agent-tasks
Colin's ALM Corner Custom Build Tasks
Language: PowerShell - Size: 2.54 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 84 - Forks: 69
openshieldai/openshield
OpenShield is a new generation security layer for AI models
Language: Go - Size: 2.19 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 83 - Forks: 10
DCjanus/cang-jie
Chinese tokenizer for tantivy, based on jieba-rs
Language: Rust - Size: 27.3 KB - Last synced at: 20 days ago - Pushed at: 3 months ago - Stars: 82 - Forks: 23
samber/go-gpt-3-encoder
Go BPE tokenizer (Encoder+Decoder) for GPT2 and GPT3
Language: Go - Size: 558 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 81 - Forks: 21
HippoPHP/Hippo
PHP standards checker.
Language: PHP - Size: 458 KB - Last synced at: over 1 year ago - Pushed at: over 8 years ago - Stars: 80 - Forks: 0
ikskuh/parser-toolkit
A toolkit that makes it easier to write recursive-descent parsers in Zig.
Language: Zig - Size: 1.1 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 79 - Forks: 8
venturachrisdev/djurl
Simple yet helpful library for writing Django urls by an easy, short and intuitive way.
Language: Python - Size: 48.8 KB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 79 - Forks: 3
AayushSameerShah/Neural-Net-Zero-to-Hero-with-Andrej
This repository contains the collection of explorative notebooks pure in python and in the language that we, humans can read. Have tried to compile all lectures from the Andrej Karpathy's 💎 playlist on Neural Networks - which we will end up with building GPT.
Language: Jupyter Notebook - Size: 191 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 77 - Forks: 10
TangXiaoLv/Android-Sqlite-Fts5-Tokenizer
集成了FTS5中文分词器的Sqlite3源码
Language: C++ - Size: 11.7 MB - Last synced at: 8 months ago - Pushed at: about 8 years ago - Stars: 75 - Forks: 16
textgain/grasp
Essential NLP & ML, short & fast pure Python code
Language: Python - Size: 58.8 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 74 - Forks: 19