GitHub topics: unstructured-data
shcherbak-ai/contextgem
ContextGem: Effortless LLM extraction from documents
Language: Python - Size: 11.4 MB - Last synced at: 43 minutes ago - Pushed at: 5 days ago - Stars: 809 - Forks: 53

nuclia/nucliadb
NucliaDB, The AI Search database for RAG
Language: Python - Size: 40.2 MB - Last synced at: about 10 hours ago - Pushed at: about 10 hours ago - Stars: 695 - Forks: 54

neo4j-labs/llm-graph-builder
Neo4j graph construction from unstructured data using LLMs
Language: Jupyter Notebook - Size: 52.8 MB - Last synced at: 32 minutes ago - Pushed at: about 8 hours ago - Stars: 3,452 - Forks: 586

nomic-ai/nomic
Interact, analyze and structure massive text, image, embedding, audio and video datasets
Language: Python - Size: 24.2 MB - Last synced at: about 3 hours ago - Pushed at: about 24 hours ago - Stars: 1,682 - Forks: 186

iterative/dvc
🦉 Data Versioning and ML Experiments
Language: Python - Size: 19.5 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 14,444 - Forks: 1,210

kuzudb/baml-kuzu-demo
Demo of knowledge graph creation and Graph RAG with BAML and Kuzu
Language: Python - Size: 3.46 MB - Last synced at: about 22 hours ago - Pushed at: 2 months ago - Stars: 31 - Forks: 3

instill-ai/console
📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core
Language: TypeScript - Size: 12.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 39 - Forks: 10

voxel51/fiftyone
Refine high-quality datasets and visual AI models
Language: Python - Size: 1.92 GB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 9,467 - Forks: 629

instill-ai/model-backend
⇋ A REST/gRPC server for Instill Model API service
Language: Go - Size: 19.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 17 - Forks: 7

Toschu95/my-benefit-finder-vienna
My Benefit Finder Vienna is an AI-powered system designed to help individuals in Vienna quickly find and apply for relevant social benefits, grants, and subsidies. Using RAG (Retrieval-Augmented Generation) and a Large Language Model (LLM), this tool provides personalized recommendations based on the latest available data from official sources.
Language: Jupyter Notebook - Size: 637 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

JSv4/OpenContracts
Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!
Language: Python - Size: 124 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 854 - Forks: 83

aclai-lab/SoleData.jl
Manage logical datasets!
Language: Julia - Size: 1.88 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 13 - Forks: 2

mitdbg/palimpzest
A System for (Optimized) Semantic Computation
Language: Python - Size: 361 MB - Last synced at: about 6 hours ago - Pushed at: 13 days ago - Stars: 108 - Forks: 20

NanoNets/docext
An on-premises, OCR-free unstructured data extraction tool powered by vision language models.
Language: Python - Size: 1.84 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 120 - Forks: 8

Zipstack/unstract
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
Language: Python - Size: 32.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 5,182 - Forks: 471

Zipstack/unstract-sdk
A framework for writing Unstract Tools/Apps
Language: Python - Size: 3.42 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 19 - Forks: 1

lotus-data/lotus
LOTUS: A semantic query engine for fast and easy LLM-powered data processing
Language: Python - Size: 1.47 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,173 - Forks: 100

graphlit/graphlit-mcp-server
Model Context Protocol (MCP) Server for Graphlit Platform
Language: TypeScript - Size: 304 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 244 - Forks: 28

CambioML/any-parser
Accurate, private and configurable document retrieval LLM
Language: Python - Size: 22.2 MB - Last synced at: 1 day ago - Pushed at: 24 days ago - Stars: 123 - Forks: 11

databricks/lilac
Curate better data for LLMs
Language: Python - Size: 37 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 1,033 - Forks: 100

tstanislawek/awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
Size: 5.56 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 1,405 - Forks: 160

ScrapeGraphAI/Scrapontologies
Python library for Entities, relationships and schemas extraction from documents
Language: Python - Size: 688 KB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 39 - Forks: 2

Francois-lenne/elt-mp4-quiberon
the goal of this project is to retrieve the video of the municipality of quiberon and see if a person is in or no
Language: Python - Size: 38.1 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

amphi-ai/amphi-etl
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
Language: TypeScript - Size: 1.54 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 1,052 - Forks: 62

milvus-io/bootcamp
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.
Language: HTML - Size: 213 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 2,102 - Forks: 626

dingodb/dingo
A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
Language: Java - Size: 26.9 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,434 - Forks: 246

instill-ai/instill-core
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
Language: Makefile - Size: 10.8 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 2,243 - Forks: 112

instill-ai/.github
🏡 Instill AI organisation profile and default configuration
Size: 52.4 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

kodexa-ai/kodexa-cli
Command Line Tools for Kodexa
Language: Python - Size: 1.15 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 1

RelevanceAI/relevanceai
Home of the AI workforce - Multi-agent system, AI agents & tools
Language: Python - Size: 68.9 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 230 - Forks: 34

instill-ai/cli
⌨️ Instill CLI for 🔮 Instill Core: https://github.com/instill-ai/instill-core
Language: Go - Size: 630 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 22 - Forks: 3

instill-ai/pipeline-backend
⇋ A REST/gRPC server for Instill VDP API service
Language: Go - Size: 74 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 26 - Forks: 21

instill-ai/mgmt-backend
⇋ A REST/gRPC server for Instill AI's Management API service
Language: Go - Size: 1.15 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 2

Nan-Shen/Precise_RAG
precisely retrieve information from pdf file
Language: Jupyter Notebook - Size: 1.62 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

instill-ai/artifact-backend
⇋ A REST/gRPC server for Instill Artifact API service
Language: Go - Size: 1.05 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 3

garyelephant/pygrok
python implementation of jordansissel's grok regular expression library
Language: Python - Size: 66.4 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 279 - Forks: 75

EulerSearch/embedding_studio
Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.
Language: Python - Size: 10.2 MB - Last synced at: 20 days ago - Pushed at: 21 days ago - Stars: 380 - Forks: 5

KatelynFaulkner/rsa-unstructured-data-comp
Scripts that compare aggregated cubes with structured monitoring schemes in South Africa
Language: HTML - Size: 13.1 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Language: Python - Size: 37.2 MB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 3,358 - Forks: 258

instill-ai/helm-charts
⎈ The Helm charts of Instill AI
Size: 205 KB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 2 - Forks: 1

BartJongejan/Bracmat
Programming language for symbolic computation with unusual combination of pattern matching features: Tree patterns, associative patterns and expressions embedded in patterns.
Language: C - Size: 23.9 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 47 - Forks: 5

Renumics/spotlight
Interactively explore unstructured datasets from your dataframe.
Language: TypeScript - Size: 46.8 MB - Last synced at: 30 days ago - Pushed at: 3 months ago - Stars: 1,164 - Forks: 86

yobix-ai/extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Language: Rust - Size: 2.88 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 1,051 - Forks: 43

teragrep/rsm_01
Teragrep record schema mapper library for Java
Language: Java - Size: 53.7 KB - Last synced at: 21 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 3

osllmai/inDox
The Indox Ecosystem offers integrated AI tools for data workflows. Our four components (IndoxArcg, IndoxMiner, IndoxJudge, and IndoxGen) enhance AI applications with advanced retrieval, extraction, evaluation, and generation capabilities, supporting multiple document formats and LLM providers.
Language: Jupyter Notebook - Size: 106 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 20 - Forks: 2

harishdeivanayagam/rowfill
Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers
Language: TypeScript - Size: 1.2 MB - Last synced at: 24 days ago - Pushed at: about 2 months ago - Stars: 275 - Forks: 14

velocitybolt/open-extract
Structured Data Extractor for AI Agents. Search your documents or the web for specific data and get it back in JSON or Markdown in a single tool call.
Language: Python - Size: 8.91 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 162 - Forks: 13

b-cubed-eu/comp-unstructured-data
Scripts to explore the conditions that determine the reliability of models, trends and status by comparing aggregated cubes with structured monitoring schemes
Language: R - Size: 1.69 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

THANGGI02/graph-rag
UltraRepo Graph RAG provides AI agents access to massive code, doc, and data repos via Knowledge Graphs (KG). KGs are generated in Neo4j and accessible via FastAPI and vector DBs. Provides AI agents with better accuracy, scalability, and reasoning over large repos.
Language: Jupyter Notebook - Size: 10 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

tiangenglu/data_wrangling
ETL-pipelines for structured and unstructured data, data wrangling worked examples, automatic data workflows
Language: Python - Size: 393 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Thehousummer233/wikipedia-ai-agent
Wikipedia AI agent research assistant. LangChain's LangGraph's ReAct agent architecture, LLMs (OpenAI, Anthropic, Google), Wikipedia API, RAG with FAISS vector db, semantic chunking, GraphRAG, Streamlit frontend, terminal and web interfaces
Size: 1.95 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

NTDLS/Katzebase
ACID compliant document-based database engine with SQL language, APIs and Management UI.
Language: C# - Size: 33.3 MB - Last synced at: 30 days ago - Pushed at: 2 months ago - Stars: 6 - Forks: 1

AnhDungPham2901/extract_data_from_pdf
Using LLM to extract unstructured data from pdf file into structured format
Language: Jupyter Notebook - Size: 217 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Clarifai/clarifai-python-datautils
Extract Transform and Load unstructured data into the Clarifai's AI platform
Language: Python - Size: 1.05 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0

FroCode/Realtime_Streaming_Unstructured-Data
Real-time streaming and processing of unstructured data (spark, airflow)
Language: Python - Size: 128 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

libraryofcelsus/LLM_File_Parser
AutoML/Unstructured Data Processing for RAG and LLM Dataset Creation. Current Database Options are: Qdrant or Marqo DB.
Language: Python - Size: 43 KB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 6 - Forks: 1

SupermatAI/supermat
Novel data representation leading to granular citations and higher accuracy
Language: Python - Size: 5.57 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 1

kodexa-ai/kodexa
Kodexa Python Client
Language: Python - Size: 10.8 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 4 - Forks: 1

lazyhope/metamodel
Intelligent Schema Designer and Unstructured Data Parser
Language: JavaScript - Size: 164 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 13 - Forks: 0

pintamonas4575/GESTBD-project-MAADM-UPM
Proyecto de "Gestión de sistemas de datos masivos" de máster de la UPM.
Language: Jupyter Notebook - Size: 1.48 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

teragrep/blf_01
Tokenizer for Teragrep
Language: Java - Size: 9.17 MB - Last synced at: 21 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 4

teragrep/dpf_03
Teragrep Tokenizer for Apache Spark
Language: Scala - Size: 78.1 KB - Last synced at: 21 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 4

SachinKalsi/html_tag_annotator
A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension
Language: JavaScript - Size: 11.8 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 3

drci-foch/BTB_extraction
Transbronchial Biopsy Document restructuration. Work in progress.
Language: Jupyter Notebook - Size: 93.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

nicbet/infozilla
The infoZilla unstructured software engineering data mining tool. It can find and extract source code regions, patches, stack traces, enumerations and itemizations from discussion threads.
Language: Java - Size: 530 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 15 - Forks: 2

DavidMoserAI/AzureDocumentIntelligenceChunker
A lightweight Python library for metadata-rich document chunking in Retrieval-Augmented Generation (RAG) workflows. It leverages Azure AI Document Intelligence to enhance chunking by retaining hierarchical structure, page numbers, and bounding boxes for seamless integration with PDF viewers.
Language: Python - Size: 24.4 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

instill-ai/controller-vdp 📦
🎮 A controller-vdp manages components in Instill VDP
Language: Go - Size: 316 KB - Last synced at: 24 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

tinaland101/UK-Food-Directory-Project
The core of this project is based on analyzing data from the UK Food Standards Agency. This data includes food hygiene ratings of various establishments across the UK. Based on the performance ratings of data the results are chosen for casting a popular food choices.
Language: Jupyter Notebook - Size: 16.6 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Zipstack/unstract-adapters
Unstract's interface to LLMs, Embeddings and VectorDBs.
Language: Python - Size: 632 KB - Last synced at: 10 days ago - Pushed at: 10 months ago - Stars: 18 - Forks: 3

IBM/pixiedust-facebook-analysis 📦
A Jupyter notebook that uses the Watson Visual Recognition and Natural Language Understanding services to enrich Facebook Analytics and uses Cognos Dashboard Embedded to explore and visualize the results in Watson Studio
Language: Jupyter Notebook - Size: 6.67 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 44 - Forks: 64

garethcmurphy/Managing-Unstructured-Metadata-at-ESS
What is metadata? a set of data that describes and gives information about other data. Can classify into separate types administrative structural descriptive scientific SCIENTIFIC METADATA … is often notoriously incomplete. Additional quantities and assumptions necessary to interpret the data may initially only be recorded on scraps of paper, har
Language: CSS - Size: 8.12 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

faisalman/re-parse-js
Compose a structured data from unstructured text using regex-based pattern matching, as found in UAParser.js
Language: TypeScript - Size: 31.3 KB - Last synced at: 25 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 1

ewdlop/X-File
https://en.wikipedia.org/wiki/The_X-Files
Size: 166 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

wangxb96/RAG-QA-Generator
RAG-QA-Generator 是一个用于检索增强生成(RAG)系统的自动化知识库构建与管理工具。该工具通过读取文档数据,利用大规模语言模型生成高质量的问答对(QA对),并将这些数据插入数据库中,实现RAG系统知识库的自动化构建和管理。
Language: Python - Size: 1.72 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 62 - Forks: 6

SalmaSalahEldin/RAG-Powered-Educational-Assistant
Size: 54.7 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

DerwenAI/cdl2024_masterclass
Connected Data London 2024, ERKG masterclass: how to generate knowledge graphs from structured and unstructured data based on entity resolution (ER) to enhance data quality for the downstream AI applications
Size: 81.1 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

kaloslazo/PyFuseDB
Database system that combines structured data retrieval through inverted indexes with unstructured data (images, audio) search using multidimensional vector embeddings, all within a unified platform.
Language: Python - Size: 631 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

floriancochard/extract-data-from-paper
A tool designed to extract numerical data from scanned historical weather documents.
Language: Python - Size: 151 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 13 - Forks: 2

MoinDalvs/Resume_Screening_and_Parser
Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention Sample Data Set Details: Resumes and financial documents
Language: Jupyter Notebook - Size: 95.9 MB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 2

IBM/generate-insights-from-data-formats-with-watson 📦
How do we process data in different formats like docx, pdf etc and generate insights to be linked with structured data in database?This pattern helps in establishing relations between structured & unstructured data to generate recommendations using Watson NLU & Watson Studio.
Language: Jupyter Notebook - Size: 1.06 MB - Last synced at: 7 days ago - Pushed at: almost 5 years ago - Stars: 14 - Forks: 14

am1tyadav/cosmonaut
Helping you find structure in the cosmos of data.
Language: Python - Size: 83 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Shivabajelan/uploading_file_to_azure_blob_using_python
In this repository, I will show how we can automate uploading unstructured data such as pdf or png files to Azure Blob using Python.
Size: 28.3 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

shay681/Constructing-Structured-Database-from-Unstructured-Legal-Documents
This project aims to compare 3 methods for transforming unstructured textual content from Hebrew legal documents into structured data
Language: Jupyter Notebook - Size: 68.4 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

hupe1980/go-textractor
📄 Amazon textract response parser written in go.
Language: Go - Size: 6.24 MB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

ShreyanSimhadri/21BKT0102_ML
LLM Models on Unstructured Data
Language: Python - Size: 6.84 KB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

krishcy25/SentimentMining-UsingPython-WordCloud-and-TextHero
Sentiment Mining (Unstructured data)- This repository focuses on Creating a Word Cloud (with most frequent/significant words) and Created list of top words by product, K-Means and PCA plot for the reviews based on category of topics as pulled by the textual review analysis of Amazon Customer Reviews on Electronic Products
Language: Jupyter Notebook - Size: 3.85 MB - Last synced at: 8 months ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

DerwenAI/strwythura
How to construct knowledge graphs from unstructured data sources
Language: Jupyter Notebook - Size: 1.22 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 67 - Forks: 6

wasay8/AutomatedGarbageImageClassifier
Implementation of CNN models(Resnet-34 and Resnet-50) to classify garbage images into 6 major categories for sustainable development and its disposability.
Language: Python - Size: 8.79 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

yeisonmontoya1815/Special-Topics-in-Data-Analytics
In my PDD Data Analytics studies at Douglas College, the Special Topics course stands out as a crucial component. This specialized module delves into advanced aspects of data analysis beyond the core curriculum, offering a deep exploration of intricate domains. Through this focused study, I aim to enhance my proficiency in handling complex datasets
Language: Jupyter Notebook - Size: 15.2 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

SC92113/User-Analytics
My 'Out of PM scopes' data project
Language: Jupyter Notebook - Size: 3.14 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

rririanto/unstructured-demo-streamlit
Extract your docs (CSV, PDF, JSON, HTML, DOCS, Sheets and more) for your own GPT and LLM projects using Unstructured.io via streamlit
Language: Python - Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 0

automorphic-ai/trex
Enforce structured output from LLMs 100% of the time
Language: Python - Size: 7.81 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 239 - Forks: 8

dominiksalvet/crypto-addr-extract
Extract cryptocurrency addresses from big datasets
Language: Python - Size: 42 KB - Last synced at: 10 months ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

bhattsahil1/smart-xtractor
Language: Python - Size: 3.45 MB - Last synced at: 10 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

TuanaCelik/unstructuredio-haystack
💙 Unstructured Data Connectors for Haystack 2.0
Language: Python - Size: 22.5 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 2

SAP-archive/hana-structurer-one 📦
SAP HANA Extreme application that analyzes unstructured data (tweets) to retrieve information such as location, people, companies, and also sentiment analysis.
Language: CSS - Size: 3.81 MB - Last synced at: 5 days ago - Pushed at: 10 months ago - Stars: 2 - Forks: 4

nagababumo/Preprocessing-Unstructured-Data-for-LLM-Applications
Language: Jupyter Notebook - Size: 37.1 KB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 2

tejasshahu/Data_Science_Machine_Learning
This repository is all about Data Science and Machine Learning.
Language: Jupyter Notebook - Size: 33.7 MB - Last synced at: 11 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

roshni-b/Log-Parser
Modular log parser that parses @nasa's apache logs and processes them.
Language: Python - Size: 30.3 MB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

MohitWani/Unstructured-data-preprocessing-
This repository contain preprocessing of Unstructured data, Like Images, text, speech and etc....
Language: Jupyter Notebook - Size: 1.76 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0
