GitHub topics: tokenization
shivendrra/shredword-trainer
Fast & Efficient BPE & Unigram tokenizer trainer library
Language: C++ - Size: 18.2 MB - Last synced at: about 8 hours ago - Pushed at: about 10 hours ago - Stars: 0 - Forks: 0

XDuch/aztec-network
A step by step guide on How to Install Aztec Network Sequencer on Testnet
Size: 16.6 KB - Last synced at: about 9 hours ago - Pushed at: about 11 hours ago - Stars: 2 - Forks: 1

RAHEEM12344/content-recommendation-engine
A modern, responsive web application that delivers personalized content recommendations based on user preferences and behavior. This interactive recommendation system allows users to discover content tailored to their interests through category selection, tag filtering, and customizable content parameters.
Language: HTML - Size: 187 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

sebastian2005-RP/GPU-Accelerated-Next-Word-Prediction-Using-LSTM-and-PyTorch
This repository implements a GPU-accelerated next-word prediction model using PyTorch and LSTM. It includes data preprocessing with NLTK, vocabulary creation, training on tokenized text, and generating text predictions, starting from a given input phrase.
Language: Jupyter Notebook - Size: 329 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

AkshaySyal/Byte-Pair-Encoding-for-Text-Tokenization
This project implements a Byte Pair Encoding (BPE) algorithm for text tokenization, training it on NLTK's Gutenberg Corpus and evaluating its accuracy, coverage, and F1-score against NLTK's standard punkt tokenizer.
Language: Jupyter Notebook - Size: 162 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

mikiyadd/my-c-array
Dynamic array implementation in C with a modular, folder-based structure.
Language: C - Size: 12.7 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

NVIDIA/Cosmos-Tokenizer 📦
A suite of image and video neural tokenizers
Language: Jupyter Notebook - Size: 16.5 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 1,643 - Forks: 76

Solcraftl2/solcraft-nexus
🚀 Professional tokenization platform on Ripple XRP Ledger - Complete solution with React frontend, Flask backend, 2FA security, and enterprise features
Language: JavaScript - Size: 2.95 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

X-Financial-Technologies/Library
XFT's repo of blockchain-related resources
Language: Solidity - Size: 614 MB - Last synced at: 1 day ago - Pushed at: 8 months ago - Stars: 2 - Forks: 2

daac-tools/vibrato
🎤 vibrato: Viterbi-based accelerated tokenizer
Language: Rust - Size: 1.08 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 368 - Forks: 15

futurepanther786/video_captioning_using_lstm
This video captioning system uses a Convolutional Neural Network (CNN) encoder and a Long Short-Term Memory (LSTM) decoder, trained on the Microsoft Research Video Description Corpus (MSVD) dataset. The system extracts features from video frames using ResNet-152 and generates descriptive captions using an LSTM-based decoder.
Language: Jupyter Notebook - Size: 30.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

fattmerchantorg/Fattmerchant-iOS-SDK
Fattmerchant iOS SDK
Language: Swift - Size: 155 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 2

izsolnay/Ancient_NLP
Goal: Discover whether modern NLP tools and predictive algorithms can provide insights into ancient text corpora
Language: Jupyter Notebook - Size: 2.74 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

dl-tokenf/contracts
On-chain RWA Tokenization Framework
Language: Solidity - Size: 875 KB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 47 - Forks: 12

ITSLab-UAegean/vesseltrack-tools
This is a repo related to the vessel AIS data, including filtering tokenization and trip extraction.
Language: Python - Size: 8.72 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

12345far/metrics-calculation-precision-recall
Laboratory 7 - Retrieval Information
Size: 1.95 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

sanderland/script_bpe
Code for the paper "BPE stays on SCRIPT"
Language: Jupyter Notebook - Size: 146 MB - Last synced at: about 8 hours ago - Pushed at: about 10 hours ago - Stars: 10 - Forks: 3

tassa-yoniso-manasi-karoto/go-ichiran
go library bindings for docker-composed Ichiran–a morphological analyzer / romanizer for japanese
Language: Go - Size: 208 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 1

Basis-Theory/developers.basistheory.com
Basis Theory Developer Documentation
Language: JavaScript - Size: 25.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 6 - Forks: 4

AndresEspin1993/b2t-tokenizer
B2T - Tokenizer for the AI Systems.
Language: PowerShell - Size: 240 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

thearhamsharif/BSCS-UBIT-2k21
Includes coursework and lab materials for students enrolled in the Bachelor of Science in Computer Science degree at UBIT.
Language: Jupyter Notebook - Size: 19.6 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

mysto/python-fpe
FPE - Format Preserving Encryption with FF3 in Python
Language: Python - Size: 144 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 101 - Forks: 20

CompLin/nheengatu
Tools and resources for the computational processing of Nheengatu (Modern Tupi)
Language: Python - Size: 31.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 8 - Forks: 4

muki119/InformationSystem_SearchEngine
Search Engine for Information Retrieval Coursework
Language: HTML - Size: 1.01 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

chuckyLeeVIII/Bitcoin-BhE-NaS Fork of bitcoin/bips
Bitcoin Improvement Proposals
Language: Wikitext - Size: 15.7 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 1

m-doughty/Raku-Tokenizers
Wrapper module for Huggingface Tokenizers
Language: Raku - Size: 2.35 MB - Last synced at: 2 days ago - Pushed at: 15 days ago - Stars: 2 - Forks: 0

NightKing-V/SubtitleLLM_EngtoSin
Experimental Eng->Sin Subtitle Translation Model
Language: Jupyter Notebook - Size: 851 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

adbar/simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Language: Python - Size: 729 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 166 - Forks: 14

CMTA/RuleEngine
Rule engine used by the CMTAT token framework to implement transfer restriction.
Language: Solidity - Size: 12.6 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 4 - Forks: 5

LendefiMarkets/lendefi-markets-avalanche
Lendefi Markets for Avalanche
Language: Solidity - Size: 1.38 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

jshuadvd/LongRoPE
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
Language: Python - Size: 562 KB - Last synced at: 7 days ago - Pushed at: 12 months ago - Stars: 146 - Forks: 14

ThuraAung1601/myTokenize
Comprehensive tokenization library for Myanmar language
Language: Python - Size: 1.87 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 5 - Forks: 1

fireblocks/fireblocks-xrp-sdk
A stateless SDK and REST API server for Fireblocks customers, simplifying advanced operations on the Ripple Ledger (XRPL).
Language: TypeScript - Size: 223 KB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

mantzaris/KeemenaPreprocessing.jl
Preprocessing for text data: cleaning, normalization, vectorization, tokenization and more
Language: Julia - Size: 313 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

GlitchedPolygons/l8w8jwt
Minimal, OpenSSL-less and super lightweight JWT library written in C.
Language: C - Size: 6.27 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 159 - Forks: 46

VKCOM/YouTokenToMe 📦
Unsupervised text tokenizer focused on computational efficiency
Language: C++ - Size: 192 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 972 - Forks: 105

securitybunker/databunker
Secure Vault for Customer PII/PHI/PCI/KYC Records
Language: Go - Size: 11.1 MB - Last synced at: 5 days ago - Pushed at: 27 days ago - Stars: 1,310 - Forks: 84

Rishabh899/Movie-recommendations-system
This is Machine learning based movie recommendation system. Its content based system.
Language: Jupyter Notebook - Size: 1.84 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

bminixhofer/tokenkit
A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.
Language: Python - Size: 366 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 30 - Forks: 4

delveopers/Shredword
Fast & efficient BPE tokenizer written in C & python for LLM tranining
Language: C++ - Size: 835 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

verygoodsecurity/vgs-collect-android
VGS Collect Android SDK
Language: Kotlin - Size: 14.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8 - Forks: 9

rosette-api/java
Babel Street Analytics Client Library for Java
Language: Java - Size: 64.8 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 12 - Forks: 35

eliben/go-sentencepiece
Go implementation of the SentencePiece tokenizer
Language: Go - Size: 200 KB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 31 - Forks: 5

sytelus/nanuGPT
Simple, reliable and well tested training code for quick experiments with transformer based models
Language: Jupyter Notebook - Size: 3.73 MB - Last synced at: 2 days ago - Pushed at: 14 days ago - Stars: 4 - Forks: 0

OpenNMT/Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Language: C++ - Size: 1.69 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 310 - Forks: 74

iam-salma/NLP-Bootcamp-with-python
A hands-on NLP Bootcamp using Python 🐍 covering text preprocessing, tokenization, stemming, lemmatization, POS tagging, NER, BoW, TF-IDF, Word2Vec, and sentiment analysis. Includes real-world projects, capstone notebooks, and ML-ready code for text classification and natural language tasks — ideal for data science, machine learning & AI learners
Language: Jupyter Notebook - Size: 9.79 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1 - Forks: 0

AgentOps-AI/tokencost
Easy token price estimates for 400+ LLMs. TokenOps.
Language: Python - Size: 1.99 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1,721 - Forks: 85

av/klmbr
klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs
Language: TeX - Size: 2.24 MB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 77 - Forks: 3

Mukeshthenraj/nltk-text-analysis
Python project using NLTK to analyze text and build spelling recommenders
Language: Python - Size: 823 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

andreihar/taibun
Taiwanese Hokkien Transliterator and Tokeniser
Language: Python - Size: 4.57 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 36 - Forks: 2

gehad-Ahmed30/Natural-Language-Processing
Language: Jupyter Notebook - Size: 44.9 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

Hasnat-Aarif-Aslam/NLP-Foundation-Tokens-Ngrams-BoW-TF-IDF-TFIDF
Comprehensive guide to text preprocessing and vectorization techniques for NLP, covering tokenization, n-grams, Bag-of-Words, TF-IDF, and related feature-engineering methods.
Size: 2.93 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

Fraviv/CropChain_MVP
Tokenizing smallholder farmers’ harvests and connecting them with investors via blockchain and web technologies for capital. .
Language: JavaScript - Size: 0 Bytes - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

Fraviv/CropChain
Tokenizing smallholder farmers’ harvests and connecting them with investors via blockchain and web technologies for capital. .
Language: Python - Size: 15.6 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

alasdairforsythe/tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Language: Go - Size: 734 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 587 - Forks: 20

icelaterdc/Turk-NLP
Türkçe için kapsamlı açık kaynak NLP (Doğal Dil İşleme) kütüphanesi.
Language: Python - Size: 20.5 KB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

explosion/spacy-streamlit
👑 spaCy building blocks and visualizers for Streamlit apps
Language: Python - Size: 61.5 KB - Last synced at: 14 days ago - Pushed at: 12 months ago - Stars: 840 - Forks: 118

yuniko-software/tokenizer-to-onnx-model
Convert Hugging Face tokenizers to ONNX models for cross-language compatibility (.NET, Java, Python) with embedding models
Language: Jupyter Notebook - Size: 43 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 24 - Forks: 2

explosion/spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Language: Python - Size: 194 MB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 31,826 - Forks: 4,520

ImadSaddik/Train_Your_Language_Model_Course
Train a language model to chat like you using your personal conversations from WhatsApp, Telegram, Signal, or other platforms.
Language: Jupyter Notebook - Size: 59.1 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 123 - Forks: 76

SayedSheikh/SpareABite-server
Language: JavaScript - Size: 34.2 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

YaroslavShved25/-Resume-Parser-Service-NLP-
A **GPT-3 based Resume Parser REST API** that converts resume PDFs into clean, structured JSON files. This service accurately extracts key fields such as contact information, education, job experience, and project history.
Language: Python - Size: 0 Bytes - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

shaheennabi/Natural-Language-Processing-Practices-and-Mini-Projects
🎇 NLP Experiments 🎆 A hands-on collection of NLP experiments 💬, featuring models like RNN, LSTM, and Attention Mechanism. 🚀 Explore applications like text classification, sentiment analysis, and language generation 🌍. Continuously updated with new algorithms and research implementations! 🔥
Language: Jupyter Notebook - Size: 47.9 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 1 - Forks: 0

Jyonn/UnifiedTokenizer
A machine learning toolkit for tokenization and indexing
Language: Python - Size: 537 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 5 - Forks: 1

WorksApplications/sudachi.rs
Sudachi in Rust 🦀 and new generation of SudachiPy
Language: Rust - Size: 15.8 MB - Last synced at: 13 days ago - Pushed at: 24 days ago - Stars: 363 - Forks: 40

spindle-health/carduus
PySpark implementation of the Open Privacy Preserving Record Linkage (OPPRL) specification.
Language: Python - Size: 1.33 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 15 - Forks: 1

PyThaiNLP/attacut
A Fast and Accurate Neural Thai Word Segmenter
Language: Python - Size: 4.15 MB - Last synced at: 19 days ago - Pushed at: 6 months ago - Stars: 86 - Forks: 18

jasminfeifei/FinTech-MVP---BetaChain
MVP for BetaChain project
Language: JavaScript - Size: 26.4 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

newtmex/smart-housing
SmartHousing is an innovative real estate tokenization platform designed to address Nigeria's significant housing deficit by leveraging blockchain technology. Our solution enables fractional ownership of real estate properties through the use of Real World Asset Tokenization, making it easier for low-income earners to invest in and own real estate.
Language: TypeScript - Size: 16.1 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 0

delBull/saaspandoras Fork of nextify-limited/saasfly
Acquire your right to participate in exclusive projects with Pandoras.
Language: TypeScript - Size: 22.5 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

Worklytics/psoxy
serverless ☁️ 🚀 , pseudonymizing proxy between Worklytics and your workplace 💼 SaaS data sources' APIs. Data Loss Prevention (DLP) 🛡️🔒 and compliance layer deployable to AWS Lambda or GCP Cloud Functions.
Language: Java - Size: 35 MB - Last synced at: about 17 hours ago - Pushed at: about 19 hours ago - Stars: 14 - Forks: 6

brave/tokenizer 📦
A modular resource tokenization service.
Language: Go - Size: 4.61 MB - Last synced at: 3 days ago - Pushed at: 21 days ago - Stars: 21 - Forks: 3

gautierdag/bpeasy
Fast bare-bones BPE for modern tokenizer training
Language: Python - Size: 1.41 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 158 - Forks: 5

johannschopplich/tokenx
📐 Fast and lightweight token estimation for any LLM without requiring a full tokenizer
Language: TypeScript - Size: 396 KB - Last synced at: 5 days ago - Pushed at: 20 days ago - Stars: 27 - Forks: 3

LeoMSgit/Personal-Lib---AI-ML-NLP
Collection of Notes, Guides, and Examples for Artificial Intelligence, Machine Learning, and Natural Language Processing
Size: 81.1 KB - Last synced at: 13 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

wolflow-ai/wolfstitch
Turn files into clean, fine-tuning-ready datasets (TXT/CSV). EPUB, PDF, and token-aware. Local, GUI-based, no cloud required.
Language: Python - Size: 368 KB - Last synced at: 2 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

afrail-inc/afrx-security-token
Smart contract for AFRX Security Token (ERC-3643) — a regulated digital asset representing tokenized equity in Afrail Inc., a U.S.-registered smart infrastructure company. Built on Ethereum (ERC-1400 base) and compliant with SEC Regulation D Rule 506(c) and Regulation S. Backed by 57.7M Class A and 20M Preferred shares. Fully KYC/AML-compliant.
Language: Solidity - Size: 1.61 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 13 - Forks: 0

izikeros/count_tokens
Count tokens in a text file.
Language: Python - Size: 137 KB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 8 - Forks: 0

kelchner63zd/aztec-network
A step by step guide on How to Install Aztec Network Sequencer on Testnet
Size: 0 Bytes - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

bithead21/parcel
Parser for cpp programms! Parcel is simple language for parsing text information and retrieving any data.
Language: C++ - Size: 1.2 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 2 - Forks: 0

taurushq-io/private-CMTAT-aztec
Private version of CMTAT security token in Noir (Aztec network DSL)
Language: Noir - Size: 129 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 15 - Forks: 3

nlp-uoregon/trankit
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Language: Python - Size: 1.06 MB - Last synced at: 12 days ago - Pushed at: 9 months ago - Stars: 756 - Forks: 103

cedricrupb/code_tokenize
Fast tokenization and structural analysis of any programming language
Language: Python - Size: 152 KB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 57 - Forks: 9

AndyFerns/Automated-Reasoning-Project
A project aiming to implement Automated Reasoning in First Order Logic using NLP
Language: Python - Size: 119 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 1 - Forks: 0

3Dpass/3DP
The Implementation of The Ledger of Things Node. Layer 1 decentralized blockchain platform for the tokenization of objects. Proof of Scan protocol. Useful smart-contracts and dApps.
Language: Rust - Size: 65.5 MB - Last synced at: 13 minutes ago - Pushed at: about 3 hours ago - Stars: 25 - Forks: 19

matiasrodlo/afiste
Blockchain based VC marketplace. Jump Chile semifinalist. (2019)
Language: PHP - Size: 470 MB - Last synced at: 22 days ago - Pushed at: 23 days ago - Stars: 1 - Forks: 0

NN0X/Cicero-Tokenizer
Custom tokenizer loosely based on Byte-Pair Encoding
Language: C++ - Size: 776 KB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

CMTA/CMTAT
Reference Solidity implementation of the CMTAT security token framework developed by CMTA to tokenize financial instruments.
Language: JavaScript - Size: 69.1 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 56 - Forks: 25

rth/vtext
Simple NLP in Rust with Python bindings
Language: Rust - Size: 273 KB - Last synced at: 18 days ago - Pushed at: about 2 years ago - Stars: 151 - Forks: 9

saulmoralespa/subscription-wompi-woo
Integración de suscripciones con Wompi para WooCommerce
Language: PHP - Size: 389 KB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

IthavinduU/jwt-auth-service
JWT authentication microservice built with Ruby and Sinatra.
Language: Ruby - Size: 3.91 KB - Last synced at: 24 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

JuGecko/Tokenization-Visualizer
A web application illustrating tokenization methods when selecting certain LLMs.
Language: C# - Size: 7.78 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

gnatykdm/b2t-tokenizer
B2T Tokenizer — Brain-Inspired Multimodal Data Processor
Language: PowerShell - Size: 240 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

chuckyLeeVIII/ai-hedge-fund Fork of virattt/ai-hedge-fund
An AI Hedge managed by knox wallet
Language: Python - Size: 1.59 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 0

anjalirj27/Llama4
Llama4 – Code from Scratch This project is inspired by [vukrosic’s courses repository](https://github.com/vukrosic/courses). Here, I’ve implemented the tokenizer logic from scratch using Python and Google Colab to better understand how LLMs handle text at the token level.
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

jkrukowski/swift-sentencepiece
Use SentencePiece in Swift for tokenization and detokenization.
Language: Swift - Size: 2.43 MB - Last synced at: 18 days ago - Pushed at: 5 months ago - Stars: 11 - Forks: 3

daac-tools/vaporetto
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
Language: Rust - Size: 3.99 MB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 238 - Forks: 10

mohansree14/Token-Classification
A Streamlit app for biomedical named entity recognition (NER) using BioBERT. Enter biomedical text and get instant, colorful token-level predictions for labels `O`, `B-AC`, `B-LF`, and `I-LF`. Includes graphical visualization and an interaction log.
Language: Jupyter Notebook - Size: 731 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

FerdiKurt/carbon-credits
These smart contracts provide a system for carbon credit tokenization, issuance, trading, and retirement.
Language: Solidity - Size: 111 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 1

lunasec-io/lunasec
LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTrace GitHub App: https://github.com/marketplace/lunatrace-by-lunasec/
Language: TypeScript - Size: 293 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 1,454 - Forks: 170
