GitHub topics: chunking-algorithm
tainmou/SmartChunk
π§© Enhance RAG processes with SmartChunk, a Python package that creates quality text chunks while preserving structure and meaning for better retrieval.
Size: 20.2 MB - Last synced at: about 5 hours ago - Pushed at: about 7 hours ago - Stars: 0 - Forks: 0
chonkie-inc/chonkie
π¦ CHONK docs with Chonkie β¨ β The no-nonsense RAG library
Language: Python - Size: 12.7 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 3,184 - Forks: 201
chonkie-inc/chonkiejs
π¦ CHONK your texts with Chonkie β¨ Type-friendly, light-weight, fast and super-simple chunking library
Language: TypeScript - Size: 572 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 280 - Forks: 9
mahnoorsheikh16/NLP-Framework-for-Literature-Summarization-in-Law-and-Policy
Implementation of an interactive chatbot for summarizing legal and policy documents. Includes data preprocessing (cleaning, tokenization, chunking), extractive summarization baselines, and fine-tuned abstractive models (PEGASUS and LED). Integrates a retrieval layer for document relevance and uses ROUGE, BLEU, and cosine similarity for evaluation.
Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
nlfiedler/fastcdc-rs
FastCDC implementation in Rust
Language: Rust - Size: 284 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 161 - Forks: 29
ayush585/SmartChunk
SmartChunk is a lightweight, structure-aware semantic chunking toolkit designed to supercharge RAG (Retrieval-Augmented Generation) and LLM pipelines. Unlike naive splitters that break text arbitrarily, SmartChunk respects document structure (headings, lists, tables, code blocks) and semantic flow, ensuring cleaner, more coherent chunks.
Language: Python - Size: 91.8 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0
iscc/fastcdc-py
FastCDC implementation in Python https://pypi.org/project/fastcdc/
Language: Python - Size: 339 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 60 - Forks: 17
Haruno19/MST-Semantic-Chunker
A new, experimental text chunking method based on Minimum Spanning Tree clustering with a hybrid semantical-positional distance measure.
Language: Python - Size: 33.2 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0
FastPix/android-uploads-sdk
Android Resumable Uploads SDK from Fastpix
Language: Kotlin - Size: 74.2 KB - Last synced at: 6 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0
D-X-W-Clerker/clerker-ai
[2024-2] Mermaid λͺ¨λΈμ νμ©ν νμ μ§μ νλ«νΌ μλΉμ€ "Clerker"
Language: Python - Size: 28.8 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 1
isaka-james/chunks-to-file
A nodejs chunking system
Language: JavaScript - Size: 55.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0
mg98/ae-chunker-go
Go implementation of the AE chunking algorithm.
Language: Go - Size: 83 KB - Last synced at: 7 months ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 0
davidwrossiter/langchunk
Source code for chunking code in multiple different languages
Language: JavaScript - Size: 6.28 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
i5heu/ChunkingChampions
Explore and benchmark the world of data chunking algorithms in 'ChunkingChampions' - a competitive arena to determine the most efficient and effective chunking strategies for varied data sizes.
Size: 564 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0
mudssrali/chunkify
a simple utility to split given array into chunks of input size with array reverse option
Language: TypeScript - Size: 105 KB - Last synced at: 1 day ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0