GitHub topics: text-chunking

Repositories

GregorBiswanger/SemanticChunker.NET

Embedding-driven, context-aware text chunking for Semantic Kernel and RAG workflows in .NET

Language: C# - Size: 485 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 0

Besthope-Official/predoc

Preprocess document service for RAG (Retriveal Augumented Generation)

Language: Python - Size: 104 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 1

isaacus-dev/semchunk

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

Language: Python - Size: 128 KB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 340 - Forks: 19

drittich/SemanticSlicer

🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.

Language: C# - Size: 50.8 KB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 24 - Forks: 1

jparkerweb/semantic-chunking

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

Language: JavaScript - Size: 8.81 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 102 - Forks: 10

lazyFrogLOL/llmdocparser

A package for parsing PDFs and analyzing their content using LLMs.

Language: Python - Size: 1.21 MB - Last synced at: 21 days ago - Pushed at: 12 months ago - Stars: 271 - Forks: 9

betcorg/llm-text-splitter Fork of golbin/llm-chunk

A lightweight TypeScript text splitter for RAG applications

Language: TypeScript - Size: 180 KB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

adityapathak-cubastion/cubastion-hr-chatbot

Presenting, Cubastion's HR chatbot - it can answer queries based on all the latest HR documents published by Cubastion's HR team. This conveniently saves time, allowing a Cubastion employee to resolve their query without having to comb through the actual documents. <<Developed with Python, sentence-transformers, Pinecone, llama3.2, and Streamlit>>

Language: Python - Size: 33.4 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

philnash/chunkers

An exploration of text splitting and chunking in JavaScript

Language: TypeScript - Size: 15.5 MB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

ChenTaHung/HTML-Text-Parser

This project is designed to extract text from documents and prepare it for processing by Large Language Models (LLM). Implemented a feature to store and utilize text style information, enabling the program to identify and segment content based on potential headers and titles.

Language: HTML - Size: 18.7 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 6 - Forks: 1

Related Keywords

text-chunking 10 chunking 5 llm 4 rag 4 python 3 text-splitter 3 text-splitting 3 nlp 2 semantic-chunking 2 pdf-parser 2 embeddings 2 ai 2 pdfparser 1 chatbots 1 ocr 1 document-analysis 1 cosine-similarity 1 huggingface 1 vector 1 openai 1 llama3 1 pinecone 1 prompt-engineering 1 sentence-transformers 1 streamlit 1 text-embeddings 1 text-extraction 1 text-generation 1 langchain-js 1 llamaindex 1 data-processing 1 large-language-models 1 llms 1 text-parsing 1 csharp 1 dotnet 1 embedding 1 library 1 semantic-kernel 1 semanticchunker 1 semantickernel 1 slm 1 api 1 document-parser 1 microservice 1 text-embedding 1 yolo 1 isaacus 1 splitting 1 text 1 azure-openai 1 chat-gpt 1 chatgpt 1 chunker 1 gpt 1 gpt-4 1 langchain 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos