An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: chunking

drittich/SemanticSlicer

🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.

Language: C# - Size: 73.2 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 23 - Forks: 1

duriantaco/pykomodo

A Python-based parallel file chunking system designed for processing large codebases into LLM-friendly chunks.

Language: Python - Size: 10.4 MB - Last synced at: 2 days ago - Pushed at: 26 days ago - Stars: 41 - Forks: 1

gpizzorno/tree-sitter-chunk-grammar

Tree-sitter parser for NLTK chunking grammars.

Language: C - Size: 4.25 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

Christopher-K-Long/thread-chunks

A python package for performing memory intensive computations in parallel using chunks and checkpointing.

Language: Python - Size: 51.8 KB - Last synced at: 3 days ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

microsoft/rag-experiment-accelerator

The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.

Language: Python - Size: 4.36 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 257 - Forks: 90

gazelle93/Various-Chunking-Methods

Exploring and benchmarking chunking methods for Retrieval-Augmented Generation (RAG), including fixed-size, recursive, sliding, semantic, and hybrid chunking strategies.

Language: Python - Size: 21.5 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

Sammyjo20/laravel-chunkable-jobs

📑 Split Laravel jobs into multiple separate job chunks

Language: PHP - Size: 54.7 KB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 84 - Forks: 4

swarmauri/swarmauri-sdk

a modular multimodal framework for ai applications

Language: Python - Size: 29.3 MB - Last synced at: about 15 hours ago - Pushed at: about 15 hours ago - Stars: 91 - Forks: 45

isaacus-dev/semchunk

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

Language: Python - Size: 128 KB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 318 - Forks: 19

DanEngelbrecht/longtail

Incremental asset delivery library

Language: C - Size: 5.52 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 56 - Forks: 8

Piletskii-Oleg/rust-chunking

Content Based Chunking algorithms implemented in Rust.

Language: Rust - Size: 145 KB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

HazemBZ/pdf-fuzz

PoC bulk search you pdf files using text look up

Size: 8.79 KB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

iscc/fastcdc-py

FastCDC implementation in Python https://pypi.org/project/fastcdc/

Language: Python - Size: 339 KB - Last synced at: 1 day ago - Pushed at: 12 months ago - Stars: 58 - Forks: 17

romanyn36/RAG-Ai-Agent

AI-powered agent leveraging RAG (Retrieval-Augmented Generation) with tool integration capabilities. Built with langchain, OpenAI, FastAPI, React frontend, it combines document-based knowledge with real-time data access and calculation tools to provide context-aware responses.

Language: JavaScript - Size: 1.32 MB - Last synced at: about 4 hours ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

DennisSmuda/godot-chunking-system

Demo on how to make a 2D grid-based map with FastNoise and infinite movement in every Direction. Uses multithreading to load/unload chunks of the map! 🌎

Language: GDScript - Size: 25.1 MB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 20 - Forks: 2

carlosplanchon/betterhtmlchunking

BetterHTMLChunking is a Python library for intelligent HTML segmentation. It builds a DOM tree from raw HTML and extracts content-rich regions of interest, making content analysis effortless. Great for LLM based processing.

Language: Python - Size: 44.9 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 34 - Forks: 2

datakaveri/k-anonymisation-SKALD

Scalable, chunk-wise K-anonymization tool based on the Optimal Lattice Anonymization (OLA) algorithm. It is designed to handle large datasets by processing them in manageable chunks, ensuring data privacy while maintaining utility.

Language: Python - Size: 46.7 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 0

gpizzorno/rules-based-entity-extraction

This codebase provides a pipeline for extracting unnamed entities from Medieval Latin texts by combining rule-based resources and a machine learning chunker trained on custom features. It supports evaluation, visualization, and model persistence for further use or deployment.

Language: HTML - Size: 2.18 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

ronomon/deduplication

Fast multi-threaded content-dependent chunking deduplication for Buffers in C++ with a reference implementation in Javascript. Ships with extensive tests, a fuzz test and a benchmark.

Language: JavaScript - Size: 34.2 KB - Last synced at: 21 days ago - Pushed at: over 5 years ago - Stars: 76 - Forks: 9

lazyFrogLOL/llmdocparser

A package for parsing PDFs and analyzing their content using LLMs.

Language: Python - Size: 1.21 MB - Last synced at: 21 days ago - Pushed at: 11 months ago - Stars: 271 - Forks: 9

jet-logic/blob_descriptor

Toolkit for managing large binary files through chunking and metadata descriptors

Language: Python - Size: 72.3 KB - Last synced at: 14 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

ejazalam831/rag-customer-support-chatbot

RAG-powered customer support chatbot using LangChain, LangGraph, and Mistral AI. An intelligent assistant that eliminates hallucinations by grounding responses in knowledge bases with conversation memory.

Language: Jupyter Notebook - Size: 3.24 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

neondatabase-labs/pgrag

Postgres extensions to support end-to-end Retrieval-Augmented Generation (RAG) pipelines

Language: Rust - Size: 136 MB - Last synced at: 5 days ago - Pushed at: about 2 months ago - Stars: 81 - Forks: 3

MurungaOwen/chunking-uploads

Handling upload of large files by chunking then merging afterwards on the server

Language: Python - Size: 2.93 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

zoner72/Datavizion-RAG

Retrieval-augmented generation (RAG) for remote & local LLM use

Language: Python - Size: 2.08 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 44 - Forks: 6

esastack/esa-restclient

An asynchronous event-driven HTTP client based on netty.

Language: Java - Size: 5.61 MB - Last synced at: 22 days ago - Pushed at: almost 3 years ago - Stars: 83 - Forks: 23

smooks/smooks

An extensible Java framework for building event-driven applications that break up XML and non-XML data into chunks for data integration

Language: Java - Size: 29.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 404 - Forks: 360

DocumentAtom/DocumentAtom

DocumentAtom provides a light, fast library for breaking input documents into constituent parts (atoms), useful for text processing, analysis, and artificial intelligence.

Language: C# - Size: 11.1 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 38 - Forks: 5

Piletskii-Oleg/chunkfs

A file system that can be used to compare different deduplication algorithms.

Language: Rust - Size: 294 KB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 3

jparkerweb/semantic-chunking

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

Language: JavaScript - Size: 8.81 MB - Last synced at: 29 days ago - Pushed at: 4 months ago - Stars: 94 - Forks: 6

systemd/casync

Content-Addressable Data Synchronization Tool

Language: C - Size: 2.48 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1,521 - Forks: 119

jiesutd/NCRFpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Language: Python - Size: 6.79 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 1,896 - Forks: 447

nathadriele/acmr-rag-rename-mbausp

Trabalho de Conclusão de Curso do MBA em Data Science e Analytics da USP/ESALQ, turma 2023. Desenvolve um sistema de recuperação da informação baseado em LLMs e RAG, aplicado à lista RENAME de medicamentos essenciais. O protótipo utiliza embeddings, bancos vetoriais e LangChain, com avaliação realizada pelo framework RAGAS.

Size: 1 MB - Last synced at: 21 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

JonahWhaler/llm-agent-toolkit

LLM AgeToolkit provides minimal, modular interfaces for core components in LLM-based applications.

Language: Python - Size: 837 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

ahmedheltaher/stream-blockify

A powerful and flexible Node.js library for processing streams in fixed-size blocks. This library extends Node's Transform stream to provide block-based data processing with customizable options for handling partial blocks, applying padding, and transforming block content.

Language: TypeScript - Size: 852 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

peebam/baba-craft

Game dev training inspired by Minecraft

Language: GDScript - Size: 1.02 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

gene-hightower/ghsmtp

Gene's SMTP server — receive Internet mail with less fuss

Language: C++ - Size: 2.35 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 3

CoCreate-app/CoCreate-webpack

A Webpack integration tool for CoCreate applications, enabling file watching, automated chunking, lazy loading, and file uploading. It leverages CoCreate.config for streamlined project builds and development workflows.

Language: JavaScript - Size: 44.9 KB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

skanda-vijaykumar/Simple_RAG

Simple RAG; query PDFs

Language: Python - Size: 5.86 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

drmingler/smart-llm-loader

smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.

Language: Python - Size: 1.09 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 65 - Forks: 2

mirth/chonky

Fully neural approach for text chunking

Language: Python - Size: 34.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 331 - Forks: 10

mirpo/chopdoc

A tool to split documents into chunks for RAG and LLM applications

Language: Go - Size: 96.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

shelfio/array-chunk-by-size

Chunk array of objects by their size in JSON

Language: TypeScript - Size: 63.5 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 3

dcarpintero/llamaindexchat

LLM Chatbot w/ Retrieval Augmented Generation using Llamaindex. It demonstrates how to impl. chunking, indexing, and source citation.

Language: Python - Size: 12.6 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 44 - Forks: 6

lennox55555/Agentic-Chatbot

An agentic chatbot powered by Retrieval-Augmented Generation (RAG), web scraping, and API integration. The chatbot is designed to assist users with questions specifically related to Duke University, focusing primarily on information about available classes and academic offerings.

Language: Python - Size: 346 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 3

Zabuzard/FastCDC4J

Fast and efficient content-defined chunking for data deduplication. Java implementation of FastCDC as library.

Language: Java - Size: 542 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 23 - Forks: 4

bnosac/crfsuite

Labelling Sequential Data in Natural Language Processing with R - using CRFsuite

Language: C - Size: 890 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 64 - Forks: 11

huanglixian/PreData-Lab

PreDataLab is a pre-data processing toolkit designed specifically for Retrieval Augmented Generation (RAG) systems, aiming to provide a development and testing environment for core functionalities such as document processing, OCR recognition, and vector embedding.

Language: Python - Size: 576 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

folbricht/desync

Alternative casync implementation

Language: Go - Size: 4.18 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 349 - Forks: 45

dcarpintero/ai-engineering

AI Engineering: Annotated NBs to dive into Self-Attention, In-Context Learning, RAG, Knowledge-Graphs, Fine-Tuning, Model Optimization, and many more.

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 6 - Forks: 0

danielathome19/Chunk-List

A Chunk List is a new, concurrent, chunk-based data structure that is easily modifiable and allows for fast run-time operations.

Language: C# - Size: 8.4 MB - Last synced at: about 11 hours ago - Pushed at: 11 months ago - Stars: 9 - Forks: 2

antoinelrnld/discord-rag

Easily create a RAG based on your Discord messages

Language: JavaScript - Size: 344 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 1

KernelPanic92/ngx-fastboot

ngx-fastboot is an Angular library designed to dynamically load configuration settings at runtime, optimizing application startup performance by offloading configurations to a separate compilation chunk.

Language: TypeScript - Size: 1020 KB - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 8 - Forks: 0

vinerya/faiss_vector_aggregator

This Python library provides a suite of advanced methods for aggregating multiple embeddings associated with a single document or entity into a single representative embedding.

Language: Python - Size: 9.77 KB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

bhattbhavesh91/chonkie-example

chonkie-example

Language: Python - Size: 37.1 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

KKenny0/RAGVizExpander Fork of gabrielchua/RAGxplorer

Open-source tool to visualise your RAG 🔮 可视化 RAG 的开源工具 🔮 支持自定义内容抽取、LLM、Embedding、Chunking(分块)以可视化向量召回效果。

Language: Jupyter Notebook - Size: 1.42 MB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 5 - Forks: 0

DanEngelbrecht/golongtail

Command line front end for longtail synchronization tool

Language: Go - Size: 230 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 33 - Forks: 9

saltyrtc/chunked-dc-js

Binary chunking that can be reassembled out-of-order.

Language: TypeScript - Size: 733 KB - Last synced at: 7 days ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 3

LelsersLasers/Minecraft

Minecraft clone with an infinite world generated from 3d perlin noise (no game engine)

Language: C++ - Size: 6.29 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 10 - Forks: 1

lh0x00/docsifer

Docsifer is a powerful tool for converting various data formats into Markdown for applications such as indexing, text analysis, and more. It supports PDF, PowerPoint, Word, Excel, Images, Audio, HTML, and other text-based formats, and leverages LLMs to enhance performance.

Language: Python - Size: 150 KB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

sushant1827/RAG_with_LangChain

Leveraging Langchain for a RAG (Retriever Augmented Generation) project, this implementation enables efficient querying across multiple books, enhancing data retrieval and natural language generation for context-rich answers.

Language: Python - Size: 2.71 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

26hzhang/neural_sequence_labeling

A TensorFlow implementation of Neural Sequence Labeling model, which is able to tackle sequence labeling tasks such as POS Tagging, Chunking, NER, Punctuation Restoration and etc.

Language: Python - Size: 136 MB - Last synced at: 20 days ago - Pushed at: over 6 years ago - Stars: 234 - Forks: 46

subhamsarangi/RAGSystemDemo

Use your own data with the power of an LLM

Language: Jupyter Notebook - Size: 169 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SupermatAI/supermat

Novel data representation leading to granular citations and higher accuracy

Language: Python - Size: 5.57 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 1

dafmontenegro/gabo-rag

'Gabo' is a RAG (Retrieval-Augmented Generation) system designed to enhance the capabilities of LLMs (Large Language Models). This project honors Colombian author Gabriel García Márquez by marking the tenth anniversary of his death.

Language: Jupyter Notebook - Size: 231 KB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

liubivi/LongDocProcessingWithLLMs

Takes an uploaded long text document in Google drive and processes (e.g. translates) it in chunks using Gemini and ChatGPT LLMs and saves the results in a Google spreadsheet

Language: Python - Size: 1.58 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

yuma-shintani/chunksize-checker

Calculate the number of total tokens, optimal chunk size and chunk overlap from any given document.

Language: JavaScript - Size: 1.14 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

xyb/chunksum

Print FastCDC rolling hash chunks and checksums.

Language: Python - Size: 50.8 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

achimoraites/image-splitter 📦

Splits an image

Language: JavaScript - Size: 7.6 MB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 4

HafiizhTH/Chatbot-with-Langchain

Language: Python - Size: 3.91 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

pngo1997/Retrieval-Augmented-Retrieval-RAG-for-Cleantech-Media

Implements a Retrieval-Augmented Generation (RAG) system.

Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

sushant1827/CrewAI-Agents-MinutesOfMeeting-Gmail

MinutesOfMeeting and Gmail is a collaborative crew of AI agents that autonomously understand audio, transcripts, summarizes, writes and drafts an email in Gmail account.

Language: Python - Size: 28.4 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 1

zeroentropy-ai/zchunk

A new chunking strategy developed by ZeroEntropy for general semantic chunking using Llama-70B.

Language: Python - Size: 57.6 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 148 - Forks: 8

UWASL/dedup-bench

DedupBench is a benchmarking tool for data chunking techniques used in data deduplication. DedupBench is designed for extensibility, allowing new chunking techniques to be implemented with minimal additional code.

Language: C++ - Size: 555 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

ThanhHung2112/Semantic_chunking

Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.

Language: Python - Size: 8.79 KB - Last synced at: 29 days ago - Pushed at: 6 months ago - Stars: 7 - Forks: 0

simon-zerisenay/42_Push_Swap

Pushswap is a 42 project emphasizing efficient sorting by minimizing operations. Participants use a limited set of commands to manipulate stacks and achieve the desired sorted order, showcasing algorithm design and optimization skills while developing problem-solving abilities.

Language: C - Size: 81.1 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

DavidMoserAI/AzureDocumentIntelligenceChunker

A lightweight Python library for metadata-rich document chunking in Retrieval-Augmented Generation (RAG) workflows. It leverages Azure AI Document Intelligence to enhance chunking by retaining hierarchical structure, page numbers, and bounding boxes for seamless integration with PDF viewers.

Language: Python - Size: 24.4 KB - Last synced at: 20 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

Mithoon278/OpenMind-AI-GenAI-Project

A compassionate mental health chatbot built using Retrieval-Augmented Generation (RAG). This project leverages advanced natural language processing techniques, including SentenceTransformers, Pinecone for vector storage, and fine-tuned LLaMA 3.3, to provide thoughtful, context-aware, and empathetic responses.

Language: Jupyter Notebook - Size: 2.69 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

mddunlap924/NLP-Essentials-with-Hugging-Face

NLP workflows and practical examples using Hugging Face

Language: Jupyter Notebook - Size: 137 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

ParthaPRay/docling_RAG_langchain_colab

This repo contains codes for RAG using docling on colab notebook with langchain, milvus, huggingface embedding model and LLM

Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

dominictarro/semchunk-rs

A fast and lightweight Rust library for splitting text into semantically meaningful chunks.

Language: Rust - Size: 16.6 KB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

GURSV/URL-summ

A URL summarizer, which summarizes the content of a URL with proper formatting. It uses 'sshleifer/distilbart-cnn-12-6', which is a distilled version of the BART model, specifically optimized for text summarization tasks, including CNN summarization.

Language: Python - Size: 112 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

jordicenzano/go-ts-segmenter

Live TS segmenter and HLS manifest creation in Go

Language: Go - Size: 1.63 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 94 - Forks: 13

ParthaPRay/Docling_Colab

This repo contains google colab notebook for handing Docling for data extraction such as text, image, table etc.

Language: Jupyter Notebook - Size: 697 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

sokratis-xyz/polymath

High performance rust web search service (like perplexity)

Language: Rust - Size: 146 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

remram44/cdchunking-rs

Content-Defined Chunking for Rust

Language: Rust - Size: 43.9 KB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 18 - Forks: 5

skitsanos/streamlit-split-text

Text splitting example using Tiktoken

Language: Python - Size: 4.88 KB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

patelvivekdev/contextual-chunks

Generate contextual chunks for Retrieval-Augmented Generation (RAG) using LLM

Language: TypeScript - Size: 226 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

skerkour/go-benchmarks

Comprehensive and reproducible benchmarks for Go developers and architects.

Language: Go - Size: 40.9 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 2

i-partalas/industrial-rag-qna-benchmark

Benchmarking the performance of proprietary vs open-source LLMs in industrial QnA tasks using various RAG-based implementations and evaluation metrics.

Language: Python - Size: 1.27 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

cckalen/intellichunk

Go Based Lightweight RAG / LLM Tool with CLI + API

Language: Go - Size: 29.3 KB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 1

jmaczan/bpe-tokenizer

Byte-Pair Encoding tokenizer for training large language models on huge datasets

Language: Python - Size: 108 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 1

CoCreate-app/CoCreate-rollup

A Rollup integration tool for CoCreate applications, enabling file watching, automated chunking, lazy loading, and file uploading. It leverages CoCreate.config for streamlined project builds and development workflows.

Language: JavaScript - Size: 33.2 KB - Last synced at: 27 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

howardyclo/grammar-pattern

Extract and align grammar patterns from English sentences.

Language: Python - Size: 128 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 54 - Forks: 10

andrew-gordon/Gord0.ChunkyMonkey.CodeGenerator

Gord0.ChunkyMonkey.CodeGenerator is a C# Roslyn code generator that generates code, at build time, to split an object containing collection properties into chunks. It also provides the ability to merge the chunks back into a single object instance.

Language: C# - Size: 935 KB - Last synced at: 23 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

acj/file-chunker

Divide a file into evenly-sized chunks

Language: Rust - Size: 8.79 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Leo310/rag-chunking-evaluation

Assess the effectiveness of chunking strategies in RAG systems via a custom evaluation framework.

Language: Jupyter Notebook - Size: 4.44 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Ven0maus/FlowVitae

Efficient library for managing 2D static and procedural grids in games.

Language: C# - Size: 597 KB - Last synced at: 26 days ago - Pushed at: 8 months ago - Stars: 8 - Forks: 1

kathleenwest/FileManagerDemo

(File Manager – A Demo of a WCF Self-Hosted Service & Client "Tester" Windows Form Application Exchanging Files) This project presents a simple File Manager Service and Client Application demonstration. The File Manager is a self-hosted (service host) WCF application launched and managed with a simple console interface. The client “tester” has a simplified GUI user interface to quickly demo and test the service (Windows Form Application).

Language: C# - Size: 14.3 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 4

isaka-james/chunks-to-file

A nodejs chunking system

Language: JavaScript - Size: 55.7 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0