An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: chunking-algorithm

tainmou/SmartChunk

🧩 Enhance RAG processes with SmartChunk, a Python package that creates quality text chunks while preserving structure and meaning for better retrieval.

Size: 20.2 MB - Last synced at: about 5 hours ago - Pushed at: about 7 hours ago - Stars: 0 - Forks: 0

chonkie-inc/chonkie

πŸ¦› CHONK docs with Chonkie ✨ β€” The no-nonsense RAG library

Language: Python - Size: 12.7 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 3,184 - Forks: 201

chonkie-inc/chonkiejs

πŸ¦› CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library

Language: TypeScript - Size: 572 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 280 - Forks: 9

mahnoorsheikh16/NLP-Framework-for-Literature-Summarization-in-Law-and-Policy

Implementation of an interactive chatbot for summarizing legal and policy documents. Includes data preprocessing (cleaning, tokenization, chunking), extractive summarization baselines, and fine-tuned abstractive models (PEGASUS and LED). Integrates a retrieval layer for document relevance and uses ROUGE, BLEU, and cosine similarity for evaluation.

Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

nlfiedler/fastcdc-rs

FastCDC implementation in Rust

Language: Rust - Size: 284 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 161 - Forks: 29

ayush585/SmartChunk

SmartChunk is a lightweight, structure-aware semantic chunking toolkit designed to supercharge RAG (Retrieval-Augmented Generation) and LLM pipelines. Unlike naive splitters that break text arbitrarily, SmartChunk respects document structure (headings, lists, tables, code blocks) and semantic flow, ensuring cleaner, more coherent chunks.

Language: Python - Size: 91.8 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

iscc/fastcdc-py

FastCDC implementation in Python https://pypi.org/project/fastcdc/

Language: Python - Size: 339 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 60 - Forks: 17

Haruno19/MST-Semantic-Chunker

A new, experimental text chunking method based on Minimum Spanning Tree clustering with a hybrid semantical-positional distance measure.

Language: Python - Size: 33.2 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

FastPix/android-uploads-sdk

Android Resumable Uploads SDK from Fastpix

Language: Kotlin - Size: 74.2 KB - Last synced at: 6 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

D-X-W-Clerker/clerker-ai

[2024-2] Mermaid λͺ¨λΈμ„ ν™œμš©ν•œ 회의 지원 ν”Œλž«νΌ μ„œλΉ„μŠ€ "Clerker"

Language: Python - Size: 28.8 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 1

isaka-james/chunks-to-file

A nodejs chunking system

Language: JavaScript - Size: 55.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

mg98/ae-chunker-go

Go implementation of the AE chunking algorithm.

Language: Go - Size: 83 KB - Last synced at: 7 months ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 0

davidwrossiter/langchunk

Source code for chunking code in multiple different languages

Language: JavaScript - Size: 6.28 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

i5heu/ChunkingChampions

Explore and benchmark the world of data chunking algorithms in 'ChunkingChampions' - a competitive arena to determine the most efficient and effective chunking strategies for varied data sizes.

Size: 564 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

mudssrali/chunkify

a simple utility to split given array into chunks of input size with array reverse option

Language: TypeScript - Size: 105 KB - Last synced at: 1 day ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0