An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: tokenization

shivendrra/shredword-trainer

Fast & Efficient BPE & Unigram tokenizer trainer library

Language: C++ - Size: 18.2 MB - Last synced at: about 8 hours ago - Pushed at: about 10 hours ago - Stars: 0 - Forks: 0

XDuch/aztec-network

A step by step guide on How to Install Aztec Network Sequencer on Testnet

Size: 16.6 KB - Last synced at: about 9 hours ago - Pushed at: about 11 hours ago - Stars: 2 - Forks: 1

RAHEEM12344/content-recommendation-engine

A modern, responsive web application that delivers personalized content recommendations based on user preferences and behavior. This interactive recommendation system allows users to discover content tailored to their interests through category selection, tag filtering, and customizable content parameters.

Language: HTML - Size: 187 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

sebastian2005-RP/GPU-Accelerated-Next-Word-Prediction-Using-LSTM-and-PyTorch

This repository implements a GPU-accelerated next-word prediction model using PyTorch and LSTM. It includes data preprocessing with NLTK, vocabulary creation, training on tokenized text, and generating text predictions, starting from a given input phrase.

Language: Jupyter Notebook - Size: 329 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

AkshaySyal/Byte-Pair-Encoding-for-Text-Tokenization

This project implements a Byte Pair Encoding (BPE) algorithm for text tokenization, training it on NLTK's Gutenberg Corpus and evaluating its accuracy, coverage, and F1-score against NLTK's standard punkt tokenizer.

Language: Jupyter Notebook - Size: 162 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

mikiyadd/my-c-array

Dynamic array implementation in C with a modular, folder-based structure.

Language: C - Size: 12.7 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

NVIDIA/Cosmos-Tokenizer 📦

A suite of image and video neural tokenizers

Language: Jupyter Notebook - Size: 16.5 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 1,643 - Forks: 76

Solcraftl2/solcraft-nexus

🚀 Professional tokenization platform on Ripple XRP Ledger - Complete solution with React frontend, Flask backend, 2FA security, and enterprise features

Language: JavaScript - Size: 2.95 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

X-Financial-Technologies/Library

XFT's repo of blockchain-related resources

Language: Solidity - Size: 614 MB - Last synced at: 1 day ago - Pushed at: 8 months ago - Stars: 2 - Forks: 2

daac-tools/vibrato

🎤 vibrato: Viterbi-based accelerated tokenizer

Language: Rust - Size: 1.08 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 368 - Forks: 15

futurepanther786/video_captioning_using_lstm

This video captioning system uses a Convolutional Neural Network (CNN) encoder and a Long Short-Term Memory (LSTM) decoder, trained on the Microsoft Research Video Description Corpus (MSVD) dataset. The system extracts features from video frames using ResNet-152 and generates descriptive captions using an LSTM-based decoder.

Language: Jupyter Notebook - Size: 30.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

fattmerchantorg/Fattmerchant-iOS-SDK

Fattmerchant iOS SDK

Language: Swift - Size: 155 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 2

izsolnay/Ancient_NLP

Goal: Discover whether modern NLP tools and predictive algorithms can provide insights into ancient text corpora

Language: Jupyter Notebook - Size: 2.74 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

dl-tokenf/contracts

On-chain RWA Tokenization Framework

Language: Solidity - Size: 875 KB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 47 - Forks: 12

ITSLab-UAegean/vesseltrack-tools

This is a repo related to the vessel AIS data, including filtering tokenization and trip extraction.

Language: Python - Size: 8.72 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

12345far/metrics-calculation-precision-recall

Laboratory 7 - Retrieval Information

Size: 1.95 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

sanderland/script_bpe

Code for the paper "BPE stays on SCRIPT"

Language: Jupyter Notebook - Size: 146 MB - Last synced at: about 8 hours ago - Pushed at: about 10 hours ago - Stars: 10 - Forks: 3

tassa-yoniso-manasi-karoto/go-ichiran

go library bindings for docker-composed Ichiran–a morphological analyzer / romanizer for japanese

Language: Go - Size: 208 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 1

Basis-Theory/developers.basistheory.com

Basis Theory Developer Documentation

Language: JavaScript - Size: 25.1 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 6 - Forks: 4

AndresEspin1993/b2t-tokenizer

B2T - Tokenizer for the AI Systems.

Language: PowerShell - Size: 240 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

thearhamsharif/BSCS-UBIT-2k21

Includes coursework and lab materials for students enrolled in the Bachelor of Science in Computer Science degree at UBIT.

Language: Jupyter Notebook - Size: 19.6 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

mysto/python-fpe

FPE - Format Preserving Encryption with FF3 in Python

Language: Python - Size: 144 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 101 - Forks: 20

CompLin/nheengatu

Tools and resources for the computational processing of Nheengatu (Modern Tupi)

Language: Python - Size: 31.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 8 - Forks: 4

muki119/InformationSystem_SearchEngine

Search Engine for Information Retrieval Coursework

Language: HTML - Size: 1.01 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

chuckyLeeVIII/Bitcoin-BhE-NaS Fork of bitcoin/bips

Bitcoin Improvement Proposals

Language: Wikitext - Size: 15.7 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 1

m-doughty/Raku-Tokenizers

Wrapper module for Huggingface Tokenizers

Language: Raku - Size: 2.35 MB - Last synced at: 2 days ago - Pushed at: 15 days ago - Stars: 2 - Forks: 0

NightKing-V/SubtitleLLM_EngtoSin

Experimental Eng->Sin Subtitle Translation Model

Language: Jupyter Notebook - Size: 851 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

adbar/simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Language: Python - Size: 729 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 166 - Forks: 14

CMTA/RuleEngine

Rule engine used by the CMTAT token framework to implement transfer restriction.

Language: Solidity - Size: 12.6 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 4 - Forks: 5

LendefiMarkets/lendefi-markets-avalanche

Lendefi Markets for Avalanche

Language: Solidity - Size: 1.38 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

jshuadvd/LongRoPE

Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper

Language: Python - Size: 562 KB - Last synced at: 7 days ago - Pushed at: 12 months ago - Stars: 146 - Forks: 14

ThuraAung1601/myTokenize

Comprehensive tokenization library for Myanmar language

Language: Python - Size: 1.87 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 5 - Forks: 1

fireblocks/fireblocks-xrp-sdk

A stateless SDK and REST API server for Fireblocks customers, simplifying advanced operations on the Ripple Ledger (XRPL).

Language: TypeScript - Size: 223 KB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

mantzaris/KeemenaPreprocessing.jl

Preprocessing for text data: cleaning, normalization, vectorization, tokenization and more

Language: Julia - Size: 313 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

GlitchedPolygons/l8w8jwt

Minimal, OpenSSL-less and super lightweight JWT library written in C.

Language: C - Size: 6.27 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 159 - Forks: 46

VKCOM/YouTokenToMe 📦

Unsupervised text tokenizer focused on computational efficiency

Language: C++ - Size: 192 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 972 - Forks: 105

securitybunker/databunker

Secure Vault for Customer PII/PHI/PCI/KYC Records

Language: Go - Size: 11.1 MB - Last synced at: 5 days ago - Pushed at: 27 days ago - Stars: 1,310 - Forks: 84

Rishabh899/Movie-recommendations-system

This is Machine learning based movie recommendation system. Its content based system.

Language: Jupyter Notebook - Size: 1.84 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

bminixhofer/tokenkit

A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.

Language: Python - Size: 366 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 30 - Forks: 4

delveopers/Shredword

Fast & efficient BPE tokenizer written in C & python for LLM tranining

Language: C++ - Size: 835 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

verygoodsecurity/vgs-collect-android

VGS Collect Android SDK

Language: Kotlin - Size: 14.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8 - Forks: 9

rosette-api/java

Babel Street Analytics Client Library for Java

Language: Java - Size: 64.8 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 12 - Forks: 35

eliben/go-sentencepiece

Go implementation of the SentencePiece tokenizer

Language: Go - Size: 200 KB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 31 - Forks: 5

sytelus/nanuGPT

Simple, reliable and well tested training code for quick experiments with transformer based models

Language: Jupyter Notebook - Size: 3.73 MB - Last synced at: 2 days ago - Pushed at: 14 days ago - Stars: 4 - Forks: 0

OpenNMT/Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

Language: C++ - Size: 1.69 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 310 - Forks: 74

iam-salma/NLP-Bootcamp-with-python

A hands-on NLP Bootcamp using Python 🐍 covering text preprocessing, tokenization, stemming, lemmatization, POS tagging, NER, BoW, TF-IDF, Word2Vec, and sentiment analysis. Includes real-world projects, capstone notebooks, and ML-ready code for text classification and natural language tasks — ideal for data science, machine learning & AI learners

Language: Jupyter Notebook - Size: 9.79 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1 - Forks: 0

AgentOps-AI/tokencost

Easy token price estimates for 400+ LLMs. TokenOps.

Language: Python - Size: 1.99 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1,721 - Forks: 85

av/klmbr

klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs

Language: TeX - Size: 2.24 MB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 77 - Forks: 3

Mukeshthenraj/nltk-text-analysis

Python project using NLTK to analyze text and build spelling recommenders

Language: Python - Size: 823 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

andreihar/taibun

Taiwanese Hokkien Transliterator and Tokeniser

Language: Python - Size: 4.57 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 36 - Forks: 2

gehad-Ahmed30/Natural-Language-Processing

Language: Jupyter Notebook - Size: 44.9 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

Hasnat-Aarif-Aslam/NLP-Foundation-Tokens-Ngrams-BoW-TF-IDF-TFIDF

Comprehensive guide to text preprocessing and vectorization techniques for NLP, covering tokenization, n-grams, Bag-of-Words, TF-IDF, and related feature-engineering methods.

Size: 2.93 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

Fraviv/CropChain_MVP

Tokenizing smallholder farmers’ harvests and connecting them with investors via blockchain and web technologies for capital. .

Language: JavaScript - Size: 0 Bytes - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

Fraviv/CropChain

Tokenizing smallholder farmers’ harvests and connecting them with investors via blockchain and web technologies for capital. .

Language: Python - Size: 15.6 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

alasdairforsythe/tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

Language: Go - Size: 734 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 587 - Forks: 20

icelaterdc/Turk-NLP

Türkçe için kapsamlı açık kaynak NLP (Doğal Dil İşleme) kütüphanesi.

Language: Python - Size: 20.5 KB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

explosion/spacy-streamlit

👑 spaCy building blocks and visualizers for Streamlit apps

Language: Python - Size: 61.5 KB - Last synced at: 14 days ago - Pushed at: 12 months ago - Stars: 840 - Forks: 118

yuniko-software/tokenizer-to-onnx-model

Convert Hugging Face tokenizers to ONNX models for cross-language compatibility (.NET, Java, Python) with embedding models

Language: Jupyter Notebook - Size: 43 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 24 - Forks: 2

explosion/spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python

Language: Python - Size: 194 MB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 31,826 - Forks: 4,520

ImadSaddik/Train_Your_Language_Model_Course

Train a language model to chat like you using your personal conversations from WhatsApp, Telegram, Signal, or other platforms.

Language: Jupyter Notebook - Size: 59.1 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 123 - Forks: 76

SayedSheikh/SpareABite-server

Language: JavaScript - Size: 34.2 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

YaroslavShved25/-Resume-Parser-Service-NLP-

A **GPT-3 based Resume Parser REST API** that converts resume PDFs into clean, structured JSON files. This service accurately extracts key fields such as contact information, education, job experience, and project history.

Language: Python - Size: 0 Bytes - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

shaheennabi/Natural-Language-Processing-Practices-and-Mini-Projects

🎇 NLP Experiments 🎆 A hands-on collection of NLP experiments 💬, featuring models like RNN, LSTM, and Attention Mechanism. 🚀 Explore applications like text classification, sentiment analysis, and language generation 🌍. Continuously updated with new algorithms and research implementations! 🔥

Language: Jupyter Notebook - Size: 47.9 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 1 - Forks: 0

Jyonn/UnifiedTokenizer

A machine learning toolkit for tokenization and indexing

Language: Python - Size: 537 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 5 - Forks: 1

WorksApplications/sudachi.rs

Sudachi in Rust 🦀 and new generation of SudachiPy

Language: Rust - Size: 15.8 MB - Last synced at: 13 days ago - Pushed at: 24 days ago - Stars: 363 - Forks: 40

spindle-health/carduus

PySpark implementation of the Open Privacy Preserving Record Linkage (OPPRL) specification.

Language: Python - Size: 1.33 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 15 - Forks: 1

PyThaiNLP/attacut

A Fast and Accurate Neural Thai Word Segmenter

Language: Python - Size: 4.15 MB - Last synced at: 19 days ago - Pushed at: 6 months ago - Stars: 86 - Forks: 18

jasminfeifei/FinTech-MVP---BetaChain

MVP for BetaChain project

Language: JavaScript - Size: 26.4 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

newtmex/smart-housing

SmartHousing is an innovative real estate tokenization platform designed to address Nigeria's significant housing deficit by leveraging blockchain technology. Our solution enables fractional ownership of real estate properties through the use of Real World Asset Tokenization, making it easier for low-income earners to invest in and own real estate.

Language: TypeScript - Size: 16.1 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 0

delBull/saaspandoras Fork of nextify-limited/saasfly

Acquire your right to participate in exclusive projects with Pandoras.

Language: TypeScript - Size: 22.5 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

Worklytics/psoxy

serverless ☁️ 🚀 , pseudonymizing proxy between Worklytics and your workplace 💼 SaaS data sources' APIs. Data Loss Prevention (DLP) 🛡️🔒 and compliance layer deployable to AWS Lambda or GCP Cloud Functions.

Language: Java - Size: 35 MB - Last synced at: about 17 hours ago - Pushed at: about 19 hours ago - Stars: 14 - Forks: 6

brave/tokenizer 📦

A modular resource tokenization service.

Language: Go - Size: 4.61 MB - Last synced at: 3 days ago - Pushed at: 21 days ago - Stars: 21 - Forks: 3

gautierdag/bpeasy

Fast bare-bones BPE for modern tokenizer training

Language: Python - Size: 1.41 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 158 - Forks: 5

johannschopplich/tokenx

📐 Fast and lightweight token estimation for any LLM without requiring a full tokenizer

Language: TypeScript - Size: 396 KB - Last synced at: 5 days ago - Pushed at: 20 days ago - Stars: 27 - Forks: 3

LeoMSgit/Personal-Lib---AI-ML-NLP

Collection of Notes, Guides, and Examples for Artificial Intelligence, Machine Learning, and Natural Language Processing

Size: 81.1 KB - Last synced at: 13 days ago - Pushed at: 21 days ago - Stars: 1 - Forks: 0

wolflow-ai/wolfstitch

Turn files into clean, fine-tuning-ready datasets (TXT/CSV). EPUB, PDF, and token-aware. Local, GUI-based, no cloud required.

Language: Python - Size: 368 KB - Last synced at: 2 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

afrail-inc/afrx-security-token

Smart contract for AFRX Security Token (ERC-3643) — a regulated digital asset representing tokenized equity in Afrail Inc., a U.S.-registered smart infrastructure company. Built on Ethereum (ERC-1400 base) and compliant with SEC Regulation D Rule 506(c) and Regulation S. Backed by 57.7M Class A and 20M Preferred shares. Fully KYC/AML-compliant.

Language: Solidity - Size: 1.61 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 13 - Forks: 0

izikeros/count_tokens

Count tokens in a text file.

Language: Python - Size: 137 KB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 8 - Forks: 0

kelchner63zd/aztec-network

A step by step guide on How to Install Aztec Network Sequencer on Testnet

Size: 0 Bytes - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

bithead21/parcel

Parser for cpp programms! Parcel is simple language for parsing text information and retrieving any data.

Language: C++ - Size: 1.2 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 2 - Forks: 0

taurushq-io/private-CMTAT-aztec

Private version of CMTAT security token in Noir (Aztec network DSL)

Language: Noir - Size: 129 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 15 - Forks: 3

nlp-uoregon/trankit

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Language: Python - Size: 1.06 MB - Last synced at: 12 days ago - Pushed at: 9 months ago - Stars: 756 - Forks: 103

cedricrupb/code_tokenize

Fast tokenization and structural analysis of any programming language

Language: Python - Size: 152 KB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 57 - Forks: 9

AndyFerns/Automated-Reasoning-Project

A project aiming to implement Automated Reasoning in First Order Logic using NLP

Language: Python - Size: 119 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 1 - Forks: 0

3Dpass/3DP

The Implementation of The Ledger of Things Node. Layer 1 decentralized blockchain platform for the tokenization of objects. Proof of Scan protocol. Useful smart-contracts and dApps.

Language: Rust - Size: 65.5 MB - Last synced at: 13 minutes ago - Pushed at: about 3 hours ago - Stars: 25 - Forks: 19

matiasrodlo/afiste

Blockchain based VC marketplace. Jump Chile semifinalist. (2019)

Language: PHP - Size: 470 MB - Last synced at: 22 days ago - Pushed at: 23 days ago - Stars: 1 - Forks: 0

NN0X/Cicero-Tokenizer

Custom tokenizer loosely based on Byte-Pair Encoding

Language: C++ - Size: 776 KB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

CMTA/CMTAT

Reference Solidity implementation of the CMTAT security token framework developed by CMTA to tokenize financial instruments.

Language: JavaScript - Size: 69.1 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 56 - Forks: 25

rth/vtext

Simple NLP in Rust with Python bindings

Language: Rust - Size: 273 KB - Last synced at: 18 days ago - Pushed at: about 2 years ago - Stars: 151 - Forks: 9

saulmoralespa/subscription-wompi-woo

Integración de suscripciones con Wompi para WooCommerce

Language: PHP - Size: 389 KB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

IthavinduU/jwt-auth-service

JWT authentication microservice built with Ruby and Sinatra.

Language: Ruby - Size: 3.91 KB - Last synced at: 24 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

JuGecko/Tokenization-Visualizer

A web application illustrating tokenization methods when selecting certain LLMs.

Language: C# - Size: 7.78 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

gnatykdm/b2t-tokenizer

B2T Tokenizer — Brain-Inspired Multimodal Data Processor

Language: PowerShell - Size: 240 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

chuckyLeeVIII/ai-hedge-fund Fork of virattt/ai-hedge-fund

An AI Hedge managed by knox wallet

Language: Python - Size: 1.59 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 0

anjalirj27/Llama4

Llama4 – Code from Scratch This project is inspired by [vukrosic’s courses repository](https://github.com/vukrosic/courses). Here, I’ve implemented the tokenizer logic from scratch using Python and Google Colab to better understand how LLMs handle text at the token level.

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

jkrukowski/swift-sentencepiece

Use SentencePiece in Swift for tokenization and detokenization.

Language: Swift - Size: 2.43 MB - Last synced at: 18 days ago - Pushed at: 5 months ago - Stars: 11 - Forks: 3

daac-tools/vaporetto

🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer

Language: Rust - Size: 3.99 MB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 238 - Forks: 10

mohansree14/Token-Classification

A Streamlit app for biomedical named entity recognition (NER) using BioBERT. Enter biomedical text and get instant, colorful token-level predictions for labels `O`, `B-AC`, `B-LF`, and `I-LF`. Includes graphical visualization and an interaction log.

Language: Jupyter Notebook - Size: 731 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

FerdiKurt/carbon-credits

These smart contracts provide a system for carbon credit tokenization, issuance, trading, and retirement.

Language: Solidity - Size: 111 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 1

lunasec-io/lunasec

LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTrace GitHub App: https://github.com/marketplace/lunatrace-by-lunasec/

Language: TypeScript - Size: 293 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 1,454 - Forks: 170