GitHub topics: chunking
drittich/SemanticSlicer
🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.
Language: C# - Size: 73.2 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 23 - Forks: 1

duriantaco/pykomodo
A Python-based parallel file chunking system designed for processing large codebases into LLM-friendly chunks.
Language: Python - Size: 10.4 MB - Last synced at: 2 days ago - Pushed at: 26 days ago - Stars: 41 - Forks: 1

gpizzorno/tree-sitter-chunk-grammar
Tree-sitter parser for NLTK chunking grammars.
Language: C - Size: 4.25 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

Christopher-K-Long/thread-chunks
A python package for performing memory intensive computations in parallel using chunks and checkpointing.
Language: Python - Size: 51.8 KB - Last synced at: 3 days ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

microsoft/rag-experiment-accelerator
The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.
Language: Python - Size: 4.36 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 257 - Forks: 90

gazelle93/Various-Chunking-Methods
Exploring and benchmarking chunking methods for Retrieval-Augmented Generation (RAG), including fixed-size, recursive, sliding, semantic, and hybrid chunking strategies.
Language: Python - Size: 21.5 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

Sammyjo20/laravel-chunkable-jobs
📑 Split Laravel jobs into multiple separate job chunks
Language: PHP - Size: 54.7 KB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 84 - Forks: 4

swarmauri/swarmauri-sdk
a modular multimodal framework for ai applications
Language: Python - Size: 29.3 MB - Last synced at: about 15 hours ago - Pushed at: about 15 hours ago - Stars: 91 - Forks: 45

isaacus-dev/semchunk
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
Language: Python - Size: 128 KB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 318 - Forks: 19

DanEngelbrecht/longtail
Incremental asset delivery library
Language: C - Size: 5.52 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 56 - Forks: 8

Piletskii-Oleg/rust-chunking
Content Based Chunking algorithms implemented in Rust.
Language: Rust - Size: 145 KB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

HazemBZ/pdf-fuzz
PoC bulk search you pdf files using text look up
Size: 8.79 KB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

iscc/fastcdc-py
FastCDC implementation in Python https://pypi.org/project/fastcdc/
Language: Python - Size: 339 KB - Last synced at: 1 day ago - Pushed at: 12 months ago - Stars: 58 - Forks: 17

romanyn36/RAG-Ai-Agent
AI-powered agent leveraging RAG (Retrieval-Augmented Generation) with tool integration capabilities. Built with langchain, OpenAI, FastAPI, React frontend, it combines document-based knowledge with real-time data access and calculation tools to provide context-aware responses.
Language: JavaScript - Size: 1.32 MB - Last synced at: about 4 hours ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

DennisSmuda/godot-chunking-system
Demo on how to make a 2D grid-based map with FastNoise and infinite movement in every Direction. Uses multithreading to load/unload chunks of the map! 🌎
Language: GDScript - Size: 25.1 MB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 20 - Forks: 2

carlosplanchon/betterhtmlchunking
BetterHTMLChunking is a Python library for intelligent HTML segmentation. It builds a DOM tree from raw HTML and extracts content-rich regions of interest, making content analysis effortless. Great for LLM based processing.
Language: Python - Size: 44.9 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 34 - Forks: 2

datakaveri/k-anonymisation-SKALD
Scalable, chunk-wise K-anonymization tool based on the Optimal Lattice Anonymization (OLA) algorithm. It is designed to handle large datasets by processing them in manageable chunks, ensuring data privacy while maintaining utility.
Language: Python - Size: 46.7 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 0

gpizzorno/rules-based-entity-extraction
This codebase provides a pipeline for extracting unnamed entities from Medieval Latin texts by combining rule-based resources and a machine learning chunker trained on custom features. It supports evaluation, visualization, and model persistence for further use or deployment.
Language: HTML - Size: 2.18 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

ronomon/deduplication
Fast multi-threaded content-dependent chunking deduplication for Buffers in C++ with a reference implementation in Javascript. Ships with extensive tests, a fuzz test and a benchmark.
Language: JavaScript - Size: 34.2 KB - Last synced at: 21 days ago - Pushed at: over 5 years ago - Stars: 76 - Forks: 9

lazyFrogLOL/llmdocparser
A package for parsing PDFs and analyzing their content using LLMs.
Language: Python - Size: 1.21 MB - Last synced at: 21 days ago - Pushed at: 11 months ago - Stars: 271 - Forks: 9

jet-logic/blob_descriptor
Toolkit for managing large binary files through chunking and metadata descriptors
Language: Python - Size: 72.3 KB - Last synced at: 14 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

ejazalam831/rag-customer-support-chatbot
RAG-powered customer support chatbot using LangChain, LangGraph, and Mistral AI. An intelligent assistant that eliminates hallucinations by grounding responses in knowledge bases with conversation memory.
Language: Jupyter Notebook - Size: 3.24 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

neondatabase-labs/pgrag
Postgres extensions to support end-to-end Retrieval-Augmented Generation (RAG) pipelines
Language: Rust - Size: 136 MB - Last synced at: 5 days ago - Pushed at: about 2 months ago - Stars: 81 - Forks: 3

MurungaOwen/chunking-uploads
Handling upload of large files by chunking then merging afterwards on the server
Language: Python - Size: 2.93 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

zoner72/Datavizion-RAG
Retrieval-augmented generation (RAG) for remote & local LLM use
Language: Python - Size: 2.08 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 44 - Forks: 6

esastack/esa-restclient
An asynchronous event-driven HTTP client based on netty.
Language: Java - Size: 5.61 MB - Last synced at: 22 days ago - Pushed at: almost 3 years ago - Stars: 83 - Forks: 23

smooks/smooks
An extensible Java framework for building event-driven applications that break up XML and non-XML data into chunks for data integration
Language: Java - Size: 29.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 404 - Forks: 360

DocumentAtom/DocumentAtom
DocumentAtom provides a light, fast library for breaking input documents into constituent parts (atoms), useful for text processing, analysis, and artificial intelligence.
Language: C# - Size: 11.1 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 38 - Forks: 5

Piletskii-Oleg/chunkfs
A file system that can be used to compare different deduplication algorithms.
Language: Rust - Size: 294 KB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 3

jparkerweb/semantic-chunking
🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
Language: JavaScript - Size: 8.81 MB - Last synced at: 29 days ago - Pushed at: 4 months ago - Stars: 94 - Forks: 6

systemd/casync
Content-Addressable Data Synchronization Tool
Language: C - Size: 2.48 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1,521 - Forks: 119

jiesutd/NCRFpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Language: Python - Size: 6.79 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 1,896 - Forks: 447

nathadriele/acmr-rag-rename-mbausp
Trabalho de Conclusão de Curso do MBA em Data Science e Analytics da USP/ESALQ, turma 2023. Desenvolve um sistema de recuperação da informação baseado em LLMs e RAG, aplicado à lista RENAME de medicamentos essenciais. O protótipo utiliza embeddings, bancos vetoriais e LangChain, com avaliação realizada pelo framework RAGAS.
Size: 1 MB - Last synced at: 21 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

JonahWhaler/llm-agent-toolkit
LLM AgeToolkit provides minimal, modular interfaces for core components in LLM-based applications.
Language: Python - Size: 837 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

ahmedheltaher/stream-blockify
A powerful and flexible Node.js library for processing streams in fixed-size blocks. This library extends Node's Transform stream to provide block-based data processing with customizable options for handling partial blocks, applying padding, and transforming block content.
Language: TypeScript - Size: 852 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

peebam/baba-craft
Game dev training inspired by Minecraft
Language: GDScript - Size: 1.02 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

gene-hightower/ghsmtp
Gene's SMTP server — receive Internet mail with less fuss
Language: C++ - Size: 2.35 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 3

CoCreate-app/CoCreate-webpack
A Webpack integration tool for CoCreate applications, enabling file watching, automated chunking, lazy loading, and file uploading. It leverages CoCreate.config for streamlined project builds and development workflows.
Language: JavaScript - Size: 44.9 KB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

skanda-vijaykumar/Simple_RAG
Simple RAG; query PDFs
Language: Python - Size: 5.86 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

drmingler/smart-llm-loader
smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.
Language: Python - Size: 1.09 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 65 - Forks: 2

mirth/chonky
Fully neural approach for text chunking
Language: Python - Size: 34.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 331 - Forks: 10

mirpo/chopdoc
A tool to split documents into chunks for RAG and LLM applications
Language: Go - Size: 96.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

shelfio/array-chunk-by-size
Chunk array of objects by their size in JSON
Language: TypeScript - Size: 63.5 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 3

dcarpintero/llamaindexchat
LLM Chatbot w/ Retrieval Augmented Generation using Llamaindex. It demonstrates how to impl. chunking, indexing, and source citation.
Language: Python - Size: 12.6 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 44 - Forks: 6

lennox55555/Agentic-Chatbot
An agentic chatbot powered by Retrieval-Augmented Generation (RAG), web scraping, and API integration. The chatbot is designed to assist users with questions specifically related to Duke University, focusing primarily on information about available classes and academic offerings.
Language: Python - Size: 346 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 3

Zabuzard/FastCDC4J
Fast and efficient content-defined chunking for data deduplication. Java implementation of FastCDC as library.
Language: Java - Size: 542 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 23 - Forks: 4

bnosac/crfsuite
Labelling Sequential Data in Natural Language Processing with R - using CRFsuite
Language: C - Size: 890 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 64 - Forks: 11

huanglixian/PreData-Lab
PreDataLab is a pre-data processing toolkit designed specifically for Retrieval Augmented Generation (RAG) systems, aiming to provide a development and testing environment for core functionalities such as document processing, OCR recognition, and vector embedding.
Language: Python - Size: 576 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

folbricht/desync
Alternative casync implementation
Language: Go - Size: 4.18 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 349 - Forks: 45

dcarpintero/ai-engineering
AI Engineering: Annotated NBs to dive into Self-Attention, In-Context Learning, RAG, Knowledge-Graphs, Fine-Tuning, Model Optimization, and many more.
Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 6 - Forks: 0

danielathome19/Chunk-List
A Chunk List is a new, concurrent, chunk-based data structure that is easily modifiable and allows for fast run-time operations.
Language: C# - Size: 8.4 MB - Last synced at: about 11 hours ago - Pushed at: 11 months ago - Stars: 9 - Forks: 2

antoinelrnld/discord-rag
Easily create a RAG based on your Discord messages
Language: JavaScript - Size: 344 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 1

KernelPanic92/ngx-fastboot
ngx-fastboot is an Angular library designed to dynamically load configuration settings at runtime, optimizing application startup performance by offloading configurations to a separate compilation chunk.
Language: TypeScript - Size: 1020 KB - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 8 - Forks: 0

vinerya/faiss_vector_aggregator
This Python library provides a suite of advanced methods for aggregating multiple embeddings associated with a single document or entity into a single representative embedding.
Language: Python - Size: 9.77 KB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

bhattbhavesh91/chonkie-example
chonkie-example
Language: Python - Size: 37.1 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

KKenny0/RAGVizExpander Fork of gabrielchua/RAGxplorer
Open-source tool to visualise your RAG 🔮 可视化 RAG 的开源工具 🔮 支持自定义内容抽取、LLM、Embedding、Chunking(分块)以可视化向量召回效果。
Language: Jupyter Notebook - Size: 1.42 MB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 5 - Forks: 0

DanEngelbrecht/golongtail
Command line front end for longtail synchronization tool
Language: Go - Size: 230 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 33 - Forks: 9

saltyrtc/chunked-dc-js
Binary chunking that can be reassembled out-of-order.
Language: TypeScript - Size: 733 KB - Last synced at: 7 days ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 3

LelsersLasers/Minecraft
Minecraft clone with an infinite world generated from 3d perlin noise (no game engine)
Language: C++ - Size: 6.29 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 10 - Forks: 1

lh0x00/docsifer
Docsifer is a powerful tool for converting various data formats into Markdown for applications such as indexing, text analysis, and more. It supports PDF, PowerPoint, Word, Excel, Images, Audio, HTML, and other text-based formats, and leverages LLMs to enhance performance.
Language: Python - Size: 150 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

sushant1827/RAG_with_LangChain
Leveraging Langchain for a RAG (Retriever Augmented Generation) project, this implementation enables efficient querying across multiple books, enhancing data retrieval and natural language generation for context-rich answers.
Language: Python - Size: 2.71 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

26hzhang/neural_sequence_labeling
A TensorFlow implementation of Neural Sequence Labeling model, which is able to tackle sequence labeling tasks such as POS Tagging, Chunking, NER, Punctuation Restoration and etc.
Language: Python - Size: 136 MB - Last synced at: 20 days ago - Pushed at: over 6 years ago - Stars: 234 - Forks: 46

subhamsarangi/RAGSystemDemo
Use your own data with the power of an LLM
Language: Jupyter Notebook - Size: 169 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SupermatAI/supermat
Novel data representation leading to granular citations and higher accuracy
Language: Python - Size: 5.57 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 1

dafmontenegro/gabo-rag
'Gabo' is a RAG (Retrieval-Augmented Generation) system designed to enhance the capabilities of LLMs (Large Language Models). This project honors Colombian author Gabriel García Márquez by marking the tenth anniversary of his death.
Language: Jupyter Notebook - Size: 231 KB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

liubivi/LongDocProcessingWithLLMs
Takes an uploaded long text document in Google drive and processes (e.g. translates) it in chunks using Gemini and ChatGPT LLMs and saves the results in a Google spreadsheet
Language: Python - Size: 1.58 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

yuma-shintani/chunksize-checker
Calculate the number of total tokens, optimal chunk size and chunk overlap from any given document.
Language: JavaScript - Size: 1.14 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

xyb/chunksum
Print FastCDC rolling hash chunks and checksums.
Language: Python - Size: 50.8 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

achimoraites/image-splitter 📦
Splits an image
Language: JavaScript - Size: 7.6 MB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 4

HafiizhTH/Chatbot-with-Langchain
Language: Python - Size: 3.91 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

pngo1997/Retrieval-Augmented-Retrieval-RAG-for-Cleantech-Media
Implements a Retrieval-Augmented Generation (RAG) system.
Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

sushant1827/CrewAI-Agents-MinutesOfMeeting-Gmail
MinutesOfMeeting and Gmail is a collaborative crew of AI agents that autonomously understand audio, transcripts, summarizes, writes and drafts an email in Gmail account.
Language: Python - Size: 28.4 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 1

zeroentropy-ai/zchunk
A new chunking strategy developed by ZeroEntropy for general semantic chunking using Llama-70B.
Language: Python - Size: 57.6 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 148 - Forks: 8

UWASL/dedup-bench
DedupBench is a benchmarking tool for data chunking techniques used in data deduplication. DedupBench is designed for extensibility, allowing new chunking techniques to be implemented with minimal additional code.
Language: C++ - Size: 555 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

ThanhHung2112/Semantic_chunking
Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.
Language: Python - Size: 8.79 KB - Last synced at: 29 days ago - Pushed at: 6 months ago - Stars: 7 - Forks: 0

simon-zerisenay/42_Push_Swap
Pushswap is a 42 project emphasizing efficient sorting by minimizing operations. Participants use a limited set of commands to manipulate stacks and achieve the desired sorted order, showcasing algorithm design and optimization skills while developing problem-solving abilities.
Language: C - Size: 81.1 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

DavidMoserAI/AzureDocumentIntelligenceChunker
A lightweight Python library for metadata-rich document chunking in Retrieval-Augmented Generation (RAG) workflows. It leverages Azure AI Document Intelligence to enhance chunking by retaining hierarchical structure, page numbers, and bounding boxes for seamless integration with PDF viewers.
Language: Python - Size: 24.4 KB - Last synced at: 20 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

Mithoon278/OpenMind-AI-GenAI-Project
A compassionate mental health chatbot built using Retrieval-Augmented Generation (RAG). This project leverages advanced natural language processing techniques, including SentenceTransformers, Pinecone for vector storage, and fine-tuned LLaMA 3.3, to provide thoughtful, context-aware, and empathetic responses.
Language: Jupyter Notebook - Size: 2.69 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

mddunlap924/NLP-Essentials-with-Hugging-Face
NLP workflows and practical examples using Hugging Face
Language: Jupyter Notebook - Size: 137 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

ParthaPRay/docling_RAG_langchain_colab
This repo contains codes for RAG using docling on colab notebook with langchain, milvus, huggingface embedding model and LLM
Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

dominictarro/semchunk-rs
A fast and lightweight Rust library for splitting text into semantically meaningful chunks.
Language: Rust - Size: 16.6 KB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

GURSV/URL-summ
A URL summarizer, which summarizes the content of a URL with proper formatting. It uses 'sshleifer/distilbart-cnn-12-6', which is a distilled version of the BART model, specifically optimized for text summarization tasks, including CNN summarization.
Language: Python - Size: 112 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

jordicenzano/go-ts-segmenter
Live TS segmenter and HLS manifest creation in Go
Language: Go - Size: 1.63 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 94 - Forks: 13

ParthaPRay/Docling_Colab
This repo contains google colab notebook for handing Docling for data extraction such as text, image, table etc.
Language: Jupyter Notebook - Size: 697 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

sokratis-xyz/polymath
High performance rust web search service (like perplexity)
Language: Rust - Size: 146 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

remram44/cdchunking-rs
Content-Defined Chunking for Rust
Language: Rust - Size: 43.9 KB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 18 - Forks: 5

skitsanos/streamlit-split-text
Text splitting example using Tiktoken
Language: Python - Size: 4.88 KB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

patelvivekdev/contextual-chunks
Generate contextual chunks for Retrieval-Augmented Generation (RAG) using LLM
Language: TypeScript - Size: 226 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

skerkour/go-benchmarks
Comprehensive and reproducible benchmarks for Go developers and architects.
Language: Go - Size: 40.9 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 2

i-partalas/industrial-rag-qna-benchmark
Benchmarking the performance of proprietary vs open-source LLMs in industrial QnA tasks using various RAG-based implementations and evaluation metrics.
Language: Python - Size: 1.27 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

cckalen/intellichunk
Go Based Lightweight RAG / LLM Tool with CLI + API
Language: Go - Size: 29.3 KB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 1

jmaczan/bpe-tokenizer
Byte-Pair Encoding tokenizer for training large language models on huge datasets
Language: Python - Size: 108 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 1

CoCreate-app/CoCreate-rollup
A Rollup integration tool for CoCreate applications, enabling file watching, automated chunking, lazy loading, and file uploading. It leverages CoCreate.config for streamlined project builds and development workflows.
Language: JavaScript - Size: 33.2 KB - Last synced at: 27 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

howardyclo/grammar-pattern
Extract and align grammar patterns from English sentences.
Language: Python - Size: 128 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 54 - Forks: 10

andrew-gordon/Gord0.ChunkyMonkey.CodeGenerator
Gord0.ChunkyMonkey.CodeGenerator is a C# Roslyn code generator that generates code, at build time, to split an object containing collection properties into chunks. It also provides the ability to merge the chunks back into a single object instance.
Language: C# - Size: 935 KB - Last synced at: 23 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

acj/file-chunker
Divide a file into evenly-sized chunks
Language: Rust - Size: 8.79 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Leo310/rag-chunking-evaluation
Assess the effectiveness of chunking strategies in RAG systems via a custom evaluation framework.
Language: Jupyter Notebook - Size: 4.44 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Ven0maus/FlowVitae
Efficient library for managing 2D static and procedural grids in games.
Language: C# - Size: 597 KB - Last synced at: 26 days ago - Pushed at: 8 months ago - Stars: 8 - Forks: 1

kathleenwest/FileManagerDemo
(File Manager – A Demo of a WCF Self-Hosted Service & Client "Tester" Windows Form Application Exchanging Files) This project presents a simple File Manager Service and Client Application demonstration. The File Manager is a self-hosted (service host) WCF application launched and managed with a simple console interface. The client “tester” has a simplified GUI user interface to quickly demo and test the service (Windows Form Application).
Language: C# - Size: 14.3 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 4

isaka-james/chunks-to-file
A nodejs chunking system
Language: JavaScript - Size: 55.7 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0
