tokenizer | Topic | Ecosyste.ms: Repos

Topic: "tokenizer"

theseer/tokenizer

A small library for converting tokenized PHP source code into XML (and potentially other formats)

Language: PHP - Size: 83 KB - Last synced at: 6 months ago - Pushed at: almost 2 years ago - Stars: 5,198 - Forks: 22

Chevrotain/chevrotain

Parser Building Toolkit for JavaScript

Language: TypeScript - Size: 36.9 MB - Last synced at: 17 days ago - Pushed at: 19 days ago - Stars: 2,711 - Forks: 216

roshan-research/hazm

Persian NLP Toolkit

Language: Python - Size: 25.2 MB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 1,357 - Forks: 204

natasha/natasha

Solves basic Russian NLP tasks, API for lower level Natasha projects

Language: Python - Size: 35.7 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 1,289 - Forks: 110

dqbd/tiktokenizer

Online playground for OpenAPI tokenizers

Language: TypeScript - Size: 713 KB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 1,165 - Forks: 134

lovit/soynlp

한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.

Language: Python - Size: 34.1 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 983 - Forks: 183

ikawaha/kagome

Self-contained Japanese Morphological Analyzer written in pure Go

Language: Go - Size: 711 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 917 - Forks: 55

no-context/moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.

Language: JavaScript - Size: 770 KB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 872 - Forks: 72

BLKSerene/Wordless

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

Language: Python - Size: 75.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 741 - Forks: 96

niieani/gpt-tokenizer

The fastest JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT models (gpt-5, gpt-o*, gpt-4o, etc.). Port of OpenAI's tiktoken with additional features.

Language: TypeScript - Size: 13 MB - Last synced at: about 21 hours ago - Pushed at: 7 days ago - Stars: 714 - Forks: 52

wangfenjin/simple

支持中文和拼音的 SQLite fts5 全文搜索扩展｜ A SQLite3 fts5 tokenizer which supports Chinese and PinYin

Language: C++ - Size: 969 KB - Last synced at: 7 months ago - Pushed at: 8 months ago - Stars: 691 - Forks: 99

mathewsanders/Mustard

🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.

Language: Swift - Size: 137 KB - Last synced at: 5 months ago - Pushed at: over 7 years ago - Stars: 687 - Forks: 18

risesoft-y9/Data-Labeling

数据标注是一款专门对文本数据进行处理和标注的工具，通过简化快捷的文本标注流程和动态的算法反馈，支持用户快速标注关键词并能通过算法持续减少人工标注的成本和时间。数据标注的过程先由人工标注构建基础，再由自动标注反哺人工标注，最后由人工标注进行纠偏，从而大幅度提高标注的精准度和高效性。数据标注需要依赖开源的数字底座进行人员岗位管控。

Language: Java - Size: 1.79 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 676 - Forks: 96

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Language: Python - Size: 778 KB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 671 - Forks: 93

open-korean-text/open-korean-text

Open Korean Text Processor - An Open-source Korean Text Processor

Language: Scala - Size: 32.7 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 646 - Forks: 97

smoothnlp/SmoothNLP 📦

专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference

Language: Java - Size: 6.71 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 620 - Forks: 112

jflex-de/jflex

The fast scanner generator for Java™ with full Unicode support

Language: Java - Size: 22.1 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 618 - Forks: 119

alasdairforsythe/tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

Language: Go - Size: 734 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 600 - Forks: 20

lindera/lindera

A multilingual morphological analysis library.

Language: Rust - Size: 179 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 556 - Forks: 51

glayzzle/php-parser

:herb: NodeJS PHP Parser - extract AST or tokens

Language: JavaScript - Size: 29.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 547 - Forks: 73

lydell/js-tokens

Tiny JavaScript tokenizer.

Language: JavaScript - Size: 733 KB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 544 - Forks: 39

lionsoul2014/friso

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.

Language: C - Size: 3.07 MB - Last synced at: 8 months ago - Pushed at: about 2 years ago - Stars: 504 - Forks: 92

hplt-project/sacremoses

Python port of Moses tokenizer, truecaser and normalizer

Language: Python - Size: 724 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 495 - Forks: 60

FoundationVision/UniTok

[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding

Language: Python - Size: 32.1 MB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 494 - Forks: 10

leodevbro/vscode-blockman

VSCode extension to highlight nested code blocks

Language: TypeScript - Size: 66.5 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 488 - Forks: 19

polm/fugashi

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.

Language: C++ - Size: 489 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 487 - Forks: 39

CogComp/cogcomp-nlp

CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.

Language: Java - Size: 85.5 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 479 - Forks: 144

neurosnap/sentences

A multilingual command line sentence tokenizer in Golang

Language: Go - Size: 15.3 MB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 461 - Forks: 41

NLPOptimize/flash-tokenizer

EFFICIENT AND OPTIMIZED TOKENIZER ENGINE FOR LLM INFERENCE SERVING

Language: C++ - Size: 197 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 459 - Forks: 7

timtadh/lexmachine

Lex machinary for go.

Language: Go - Size: 296 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 412 - Forks: 28

taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

Language: Python - Size: 39.5 MB - Last synced at: 4 days ago - Pushed at: 8 days ago - Stars: 411 - Forks: 23

ku-nlp/jumanpp

Juman++ (a Morphological Analyzer Toolkit)

Language: C++ - Size: 3.78 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 397 - Forks: 45

daac-tools/vibrato

🎤 vibrato: Viterbi-based accelerated tokenizer

Language: Rust - Size: 1.1 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 384 - Forks: 21

belladoreai/llama-tokenizer-js

JS tokenizer for LLaMA 1 and 2

Language: JavaScript - Size: 3.07 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 361 - Forks: 24

zurawiki/tiktoken-rs

Ready-made tokenizer library for working with GPT and tiktoken

Language: Rust - Size: 3.71 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 353 - Forks: 65

OpenNMT/Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

Language: C++ - Size: 1.74 MB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 327 - Forks: 78

guillaume-be/rust-tokenizers

Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

Language: Rust - Size: 1.12 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 326 - Forks: 31

artitw/text2text

Text2Text Language Modeling Toolkit

Language: Python - Size: 870 KB - Last synced at: 6 days ago - Pushed at: 12 months ago - Stars: 305 - Forks: 40

sugarme/tokenizer

NLP tokenizers written in Go language

Language: Go - Size: 1.49 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 291 - Forks: 56

bitextor/bitextor

Bitextor generates translation memories from multilingual websites

Language: Python - Size: 177 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 290 - Forks: 43

tlaceby/guide-to-interpreters-series

Contains source-code for viewers following along with my Beginners Guide To Building Interpreters series on my Youtube Channel.

Language: TypeScript - Size: 65.4 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 257 - Forks: 34

mediacloud/sentence-splitter 📦

Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.

Language: Python - Size: 45.9 KB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 256 - Forks: 33

dmitry-brazhenko/SharpToken

SharpToken is a C# library for tokenizing natural language text. It's based on the tiktoken Python library and designed to be fast and accurate.

Language: C# - Size: 3.62 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 250 - Forks: 17

daac-tools/vaporetto

🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer

Language: Rust - Size: 4 MB - Last synced at: 18 days ago - Pushed at: 23 days ago - Stars: 249 - Forks: 10

bnosac/udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

Language: C++ - Size: 5.11 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 218 - Forks: 34

zhenye234/xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Language: Python - Size: 1.77 MB - Last synced at: 8 months ago - Pushed at: 9 months ago - Stars: 209 - Forks: 13

Dadmatech/DadmaTools

DadmaTools is a Persian NLP tools developed by Dadmatech Co.

Language: Python - Size: 92.6 MB - Last synced at: about 23 hours ago - Pushed at: 5 months ago - Stars: 207 - Forks: 45

fnl/syntok

Text tokenization and sentence segmentation (segtok v2)

Language: Python - Size: 203 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 207 - Forks: 35

microsoft/Tokenizer

Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs.

Language: C# - Size: 1.98 MB - Last synced at: 3 days ago - Pushed at: 9 months ago - Stars: 206 - Forks: 35

netgen/query-translator

Query Translator is a search query translator with AST representation

Language: PHP - Size: 506 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 206 - Forks: 11

mck89/peast

JavaScript parser written in PHP that generates AST from your code according to ECMAScript specification

Language: PHP - Size: 1.75 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 188 - Forks: 23

ropensci/tokenizers

Fast, Consistent Tokenization of Natural Language Text

Language: R - Size: 1.24 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 186 - Forks: 24

garvys-org/rustfst

Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.

Language: Rust - Size: 7.59 MB - Last synced at: 26 days ago - Pushed at: 6 months ago - Stars: 177 - Forks: 19

adbar/simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Language: Python - Size: 729 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 175 - Forks: 15

gautierdag/bpeasy

Fast bare-bones BPE for modern tokenizer training

Language: Python - Size: 1.41 MB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 174 - Forks: 6

botisan-ai/gpt3-tokenizer

Isomorphic JavaScript/TypeScript Tokenizer for GPT-3 and Codex Models by OpenAI.

Language: TypeScript - Size: 2.06 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 171 - Forks: 16

xinjli/transphone

phoneme tokenizer and grapheme-to-phoneme model for 8k languages

Language: Python - Size: 342 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 170 - Forks: 17

untitaker/html5gum

A WHATWG-compliant HTML5 tokenizer and tag soup parser

Language: Rust - Size: 547 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 165 - Forks: 11

howl-anderson/MicroTokenizer

一个轻量且功能全面的中文分词器，帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..

Language: Python - Size: 174 MB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 153 - Forks: 22

tsproisl/SoMaJo

A tokenizer and sentence splitter for German and English web and social media texts.

Language: Python - Size: 1.35 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 148 - Forks: 22

nette/tokenizer 📦

[DISCONTINUED] Source code tokenizer

Language: PHP - Size: 104 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 141 - Forks: 23

LorettaDevs/Loretta

A C# Lua, GLua and Luau parser, code analysis, transformation and generation library.

Language: C# - Size: 10.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 138 - Forks: 11

foonathan/lex 📦

Replaced by foonathan/lexy

Language: C++ - Size: 308 KB - Last synced at: 8 months ago - Pushed at: about 5 years ago - Stars: 138 - Forks: 8

Kensuke-Mitsuzawa/JapaneseTokenizers

aim to use JapaneseTokenizer as easy as possible

Language: Python - Size: 271 KB - Last synced at: 2 months ago - Pushed at: almost 7 years ago - Stars: 138 - Forks: 21

MagedSaeed/farasapy

A Python implementation of Farasa toolkit

Language: Python - Size: 265 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 136 - Forks: 23

wjf5203/TokBench

Image and video Tokenizer/VAE selection guide, text and face reconstruction evaluation.

Language: Python - Size: 46.5 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 133 - Forks: 0

mykolaharmash/works-for-me

Collection of developer toolkits

Language: JavaScript - Size: 14.8 MB - Last synced at: 5 months ago - Pushed at: over 7 years ago - Stars: 129 - Forks: 7

GerHobbelt/jison Fork of zaach/jison

bison / YACC / LEX in JavaScript (LALR(1), SLR(1), etc. lexer/parser generator)

Language: JavaScript - Size: 32.2 MB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 125 - Forks: 21

bzick/tokenizer

Tokenizer (lexer) for golang

Language: Go - Size: 119 KB - Last synced at: 9 months ago - Pushed at: 11 months ago - Stars: 124 - Forks: 8

Cledev-Limited/Cledev.OpenAI

.NET 7 SDK for OpenAI with a Blazor Server playground

Language: C# - Size: 511 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 124 - Forks: 21

kyegomez/MambaByte

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

Language: Python - Size: 2.18 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 123 - Forks: 9

bytexenon/Tiny-Lua-Compiler

⛄Possibly the smallest Lua compiler ever

Language: Lua - Size: 479 KB - Last synced at: 12 days ago - Pushed at: 14 days ago - Stars: 120 - Forks: 7

kakaobrain/kortok

The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)

Language: Python - Size: 5.6 MB - Last synced at: 9 months ago - Pushed at: over 5 years ago - Stars: 118 - Forks: 10

belladoreai/llama3-tokenizer-js

JS tokenizer for LLaMA 3 and LLaMA 3.1

Language: JavaScript - Size: 7.22 MB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 117 - Forks: 6

Voine/Bert-VITS2-MNN

TTS System Bert-VITS2 Android Ver, powered by alibaba-MNN engine.

Language: Kotlin - Size: 24.8 MB - Last synced at: 4 days ago - Pushed at: 9 days ago - Stars: 116 - Forks: 13

ropensci/hunspell

High-Performance Stemmer, Tokenizer, and Spell Checker for R

Language: C++ - Size: 4.45 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 113 - Forks: 46

clipperhouse/jargon

Tokenizers and lemmatizers for Go

Language: Go - Size: 1.1 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 110 - Forks: 3

bevacqua/megamark

:heart_eyes_cat: Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer

Language: JavaScript - Size: 2.28 MB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 107 - Forks: 7

togatoga/kanpyo

Japanese Morphological Analyzer written in Rust

Language: Rust - Size: 10.4 MB - Last synced at: 25 days ago - Pushed at: 27 days ago - Stars: 106 - Forks: 1

AmrDeveloper/FileQL

A tool that allow you to run SQL-like query on local files instead of database files using the GitQL SDK.

Language: Rust - Size: 822 KB - Last synced at: 30 days ago - Pushed at: 4 months ago - Stars: 105 - Forks: 3

JuliaLang/Tokenize.jl

Tokenization for Julia source code

Language: Julia - Size: 472 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 104 - Forks: 28

chriskonnertz/string-calc

PHP calculator library for mathematical terms (expressions) passed as strings

Language: PHP - Size: 307 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 102 - Forks: 19

explosion/spacy-experimental

🧪 Cutting-edge experimental spaCy components and features

Language: Python - Size: 1.33 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 101 - Forks: 20

dluc/openai-tools

A collection of tools for working with OpenAI

Language: C# - Size: 559 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 100 - Forks: 15

johannschopplich/tokenx

📐 Fast token estimation at 94% accuracy of a full tokenizer in a 2kB bundle

Language: TypeScript - Size: 658 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 99 - Forks: 5

yishn/chinese-tokenizer

Tokenizes Chinese texts into words.

Language: JavaScript - Size: 11.2 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 99 - Forks: 25

nooscraft/tokuin

CLI tool – estimates LLM tokens/costs and runs provider-aware load tests for OpenAI, Anthropic, OpenRouter, or custom endpoints.

Language: Rust - Size: 438 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 97 - Forks: 3

sefineh-ai/Amharic-Tokenizer

Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast, accurate, trainable.

Language: Python - Size: 10.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 95 - Forks: 12

clipperhouse/uax29

A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split graphemes, words, sentences.

Language: Go - Size: 920 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 92 - Forks: 4

alfianlosari/GPTEncoder

Swift BPE Encoder/Decoder for OpenAI GPT Models. A programmatic interface for tokenizing text for OpenAI ChatGPT API.

Language: Swift - Size: 554 KB - Last synced at: 15 days ago - Pushed at: almost 3 years ago - Stars: 87 - Forks: 20

colindembovsky/cols-agent-tasks

Colin's ALM Corner Custom Build Tasks

Language: PowerShell - Size: 2.54 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 84 - Forks: 69

openshieldai/openshield

OpenShield is a new generation security layer for AI models

Language: Go - Size: 2.19 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 83 - Forks: 10

DCjanus/cang-jie

Chinese tokenizer for tantivy, based on jieba-rs

Language: Rust - Size: 27.3 KB - Last synced at: 26 days ago - Pushed at: 4 months ago - Stars: 82 - Forks: 23

samber/go-gpt-3-encoder

Go BPE tokenizer (Encoder+Decoder) for GPT2 and GPT3

Language: Go - Size: 558 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 81 - Forks: 21

HippoPHP/Hippo

PHP standards checker.

Language: PHP - Size: 458 KB - Last synced at: over 1 year ago - Pushed at: over 8 years ago - Stars: 80 - Forks: 0

ikskuh/parser-toolkit

A toolkit that makes it easier to write recursive-descent parsers in Zig.

Language: Zig - Size: 1.1 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 79 - Forks: 8

venturachrisdev/djurl

Simple yet helpful library for writing Django urls by an easy, short and intuitive way.

Language: Python - Size: 48.8 KB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 79 - Forks: 3

AayushSameerShah/Neural-Net-Zero-to-Hero-with-Andrej

This repository contains the collection of explorative notebooks pure in python and in the language that we, humans can read. Have tried to compile all lectures from the Andrej Karpathy's 💎 playlist on Neural Networks - which we will end up with building GPT.

Language: Jupyter Notebook - Size: 191 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 77 - Forks: 10

TangXiaoLv/Android-Sqlite-Fts5-Tokenizer

集成了FTS5中文分词器的Sqlite3源码

Language: C++ - Size: 11.7 MB - Last synced at: 9 months ago - Pushed at: about 8 years ago - Stars: 75 - Forks: 16

textgain/grasp

Essential NLP & ML, short & fast pure Python code

Language: Python - Size: 58.8 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 74 - Forks: 19