Topic: "unstructured-data"
abdollahpour/micro-draft-manager
micro-draft-manager is a microservice that helps you to manage unstructured data in your application with sorting and full-text search
Language: Go - Size: 27.3 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

bengruher/SMS-Spam-Detection
Machine learning task to identify spam SMS messages. Project involves processing of noisy unstructured text and other NLP techniques.
Language: Jupyter Notebook - Size: 663 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

mware-solutions/bigconnect-docs
Documentation for the BigConnect platform
Size: 5.64 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

rudrakshsyal/Craigslist-Job-Listing-Transformation-via-Text-Modeling
Improved quality and presentation of job listings on Craigslist website via scraping and training data from Indeed’s job listings’, to enhance user experience, drive more traffic and thus increase revenue
Language: Jupyter Notebook - Size: 4.54 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

ihaterynn/Docling-Processor
Document Processing Script using Docling
Language: Python - Size: 4.03 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

instill-ai/artifact-backend
⇋ A REST/gRPC server for Instill Artifact API service
Language: Go - Size: 1.32 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 3

Analyst-Lochan/employee-health-analysis
This project showcases a complete data cleaning and basic analytics workflow on a real-world-style employee health dataset, simulating inconsistencies often found in raw data. It includes both uncleaned and cleaned Excel files, plus a pivot-based dashboard to derive insights.
Size: 3.53 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

lightup-data/lightudq
AI assisted data quality for unstructured data
Language: Python - Size: 1.21 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

samnaveenkumaroff/CuraOS
CuraOS is a fully modular, AI-powered pipeline that automates the transformation of unstructured multi-page medical records (PDFs, scanned documents) into structured and actionable electronic health records (EHRs).
Language: Python - Size: 896 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 1

spoortimorabad/Personally-Identifiable-Information-PII-
Detecting Personal Information and Masking Method
Language: Python - Size: 8.13 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Francois-lenne/elt-mp4-quiberon
the goal of this project is to retrieve the video of the municipality of quiberon and see if a person is in or no
Language: Python - Size: 38.1 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

b-cubed-eu/rsa-unstructured-data-comp
Scripts that compare aggregated cubes with structured monitoring schemes in South Africa
Language: R - Size: 13.1 MB - Last synced at: 27 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Nan-Shen/Precise_RAG
precisely retrieve information from pdf file
Language: Jupyter Notebook - Size: 1.62 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

teragrep/rsm_01
Teragrep record schema mapper library for Java
Language: Java - Size: 53.7 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 3

THANGGI02/graph-rag
UltraRepo Graph RAG provides AI agents access to massive code, doc, and data repos via Knowledge Graphs (KG). KGs are generated in Neo4j and accessible via FastAPI and vector DBs. Provides AI agents with better accuracy, scalability, and reasoning over large repos.
Language: Jupyter Notebook - Size: 10 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

tiangenglu/data_wrangling
ETL-pipelines for structured and unstructured data, data wrangling worked examples, automatic data workflows
Language: Python - Size: 393 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Thehousummer233/wikipedia-ai-agent
Wikipedia AI agent research assistant. LangChain's LangGraph's ReAct agent architecture, LLMs (OpenAI, Anthropic, Google), Wikipedia API, RAG with FAISS vector db, semantic chunking, GraphRAG, Streamlit frontend, terminal and web interfaces
Size: 1.95 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

AnhDungPham2901/extract_data_from_pdf
Using LLM to extract unstructured data from pdf file into structured format
Language: Jupyter Notebook - Size: 217 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

drci-foch/BTB_extraction
Transbronchial Biopsy Document restructuration. Work in progress.
Language: Jupyter Notebook - Size: 93.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

DavidMoserAI/AzureDocumentIntelligenceChunker
A lightweight Python library for metadata-rich document chunking in Retrieval-Augmented Generation (RAG) workflows. It leverages Azure AI Document Intelligence to enhance chunking by retaining hierarchical structure, page numbers, and bounding boxes for seamless integration with PDF viewers.
Language: Python - Size: 24.4 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 1

garethcmurphy/Managing-Unstructured-Metadata-at-ESS
What is metadata? a set of data that describes and gives information about other data. Can classify into separate types administrative structural descriptive scientific SCIENTIFIC METADATA … is often notoriously incomplete. Additional quantities and assumptions necessary to interpret the data may initially only be recorded on scraps of paper, har
Language: CSS - Size: 8.12 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

tinaland101/UK-Food-Directory-Project
The core of this project is based on analyzing data from the UK Food Standards Agency. This data includes food hygiene ratings of various establishments across the UK. Based on the performance ratings of data the results are chosen for casting a popular food choices.
Language: Jupyter Notebook - Size: 16.6 KB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

SalmaSalahEldin/RAG-Powered-Educational-Assistant
Size: 54.7 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

DerwenAI/cdl2024_masterclass
Connected Data London 2024, ERKG masterclass: how to generate knowledge graphs from structured and unstructured data based on entity resolution (ER) to enhance data quality for the downstream AI applications
Size: 81.1 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

am1tyadav/cosmonaut
Helping you find structure in the cosmos of data.
Language: Python - Size: 83 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

pintamonas4575/GESTBD-project-MAADM-UPM
Proyecto de "Gestión de sistemas de datos masivos" de máster de la UPM.
Language: Jupyter Notebook - Size: 1.48 MB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Shivabajelan/uploading_file_to_azure_blob_using_python
In this repository, I will show how we can automate uploading unstructured data such as pdf or png files to Azure Blob using Python.
Size: 28.3 KB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

teragrep/dpf_03
Teragrep Tokenizer for Apache Spark
Language: Scala - Size: 78.1 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 4

shay681/Constructing-Structured-Database-from-Unstructured-Legal-Documents
This project aims to compare 3 methods for transforming unstructured textual content from Hebrew legal documents into structured data
Language: Jupyter Notebook - Size: 68.4 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

teragrep/blf_01
Tokenizer for Teragrep
Language: Java - Size: 9.17 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 4

SC92113/User-Analytics
My 'Out of PM scopes' data project
Language: Jupyter Notebook - Size: 3.14 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

nagababumo/Preprocessing-Unstructured-Data-for-LLM-Applications
Language: Jupyter Notebook - Size: 37.1 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 2

MohitWani/Unstructured-data-preprocessing-
This repository contain preprocessing of Unstructured data, Like Images, text, speech and etc....
Language: Jupyter Notebook - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

instill-ai/controller-model
🎮 A controller-model manages components in Instill Model
Language: Go - Size: 351 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

NityaVerma19/Cats-vs-Dogs
Classifying 😺 and 🐶 using CNN
Language: Jupyter Notebook - Size: 2.85 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

wasay8/AutomatedGarbageImageClassifier
Implementation of CNN models(Resnet-34 and Resnet-50) to classify garbage images into 6 major categories for sustainable development and its disposability.
Language: Python - Size: 8.79 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jovezhong/real-time-milvus Fork of bytewax/real-time-milvus
Streaming meets LLM: Real-time Hacker News to Milvus/Zilliz with streaming SQL
Language: Python - Size: 2.27 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

airdac/MUD
Subject repository with NLP Python apps. UPC - Master's Degree in Data Science - Mining Unstructured Data - Spring 2024
Language: Jupyter Notebook - Size: 10.7 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

instill-ai/deprecated-vdp
💧 Instill VDP (Versatile Data Pipeline) is an open-source tool to seamlessly integrate AI to process unstructured data in the modern data stack
Language: Makefile - Size: 7.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

martinbatek/IC-UDA-Final-Project
Final Project for the Unstructured Data Analysis module in the MSc. Machine Learning and Data Science Course
Language: Jupyter Notebook - Size: 500 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

mazzasaverio/terra-text-processor
A Terraform setup for processing unstructured data on GCP with MongoDB Atlas and Confluent Kafka, featuring serverless, event-driven architecture and Cloud Run integrations.
Language: HCL - Size: 17.6 KB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

instill-ai/controller-vdp 📦
🎮 A controller-vdp manages components in Instill VDP
Language: Go - Size: 316 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

ShreyanSimhadri/21BKT0102_ML
LLM Models on Unstructured Data
Language: Python - Size: 6.84 KB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

kodexa-ai/kodexa-java
Kodexa Content Model and Client for Java
Language: Java - Size: 18.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

KamRoki/Deep-Learning-Dog-Breed
Who's a good dog? Who likes ear scratches? Well, it seems those fancy deep neural networks don't have all the answers. However, maybe they can answer that ubiquitous question we all ask when meeting a four-legged stranger: what kind of good pup is that? This notebook builds a multi-class image classifier using TensorFlow 2.0 and TensorFlow Hub.
Language: Jupyter Notebook - Size: 6.1 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

inuwamobarak/detecting-tables-in-documents
This repository contains code and resources for detecting tables in various types of documents using machine learning and computer vision techniques.
Language: Jupyter Notebook - Size: 1.8 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

janoellerich/RooTri
Language: MATLAB - Size: 124 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

perebaj/parser
Parse Unstructure text using GPT3 API
Language: Go - Size: 1.75 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

instill-ai/controller 📦
🎮 A controller to management all VDP states
Language: Go - Size: 281 KB - Last synced at: 8 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

ujunwa-DS/UNSTRUCTURED-DATA-WHATSAPP-DATA-
WhatsApp Unstructured data was cleaned with python and visualized with Power BI to obtain insight. Libraries like Numpy, Regex, openpyxl, pandas were used in this project
Language: Jupyter Notebook - Size: 209 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

instill-ai/metric-backend 📦
⇋ A REST/gRPC server for Instill AI's Metric API service
Size: 0 Bytes - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ClaudioPoli/JobAds
Management of structured and unstructured data
Language: PLpgSQL - Size: 30.3 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Mihryam/HealthNews_Tweets-ClusteringToClassification
A machine learning model on clustering of health news tweets from different news sources to extrapolate categories and then use the cluster labels for downstream classification.
Language: Jupyter Notebook - Size: 4.45 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

pedrogfleming/Snowflake-Scripts
SQL Scripts related to my learning on the Snowflake data cloud provider
Size: 3.7 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

branham-player/indexer
A parser which indexes unstructured collections of data representing William Branham's complete sermon library and structures them for loading into a data ingester
Language: JavaScript - Size: 38.1 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

Peteresis/Movies-ETL
ETL (Extract, Transform, Load) Practice. Automate the process of reading new data, processing it, and then loading it into new SQL tables. The code uses Python, RegEx, and a SQL database to build an ETL pipeline for this project.
Language: Jupyter Notebook - Size: 2.99 MB - Last synced at: 10 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

oypark/Unstructured-data-analysis-Project
멀캠 프로젝트2_비정형 데이터 분석(mulcam bigdata project2_unstructured data analysis)
Language: Jupyter Notebook - Size: 19.6 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

lilianchi/lost-or-found
A repository with our team's final Python project in MGMT 590 Analyzing Unstructured Data course at Krannert School of Management, Purdue University.
Language: Python - Size: 1.44 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

elalbaicin/progRchives
An R package for scraping and organizing ProgArchives data.
Language: R - Size: 3.49 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

AsishMandoi/quantum-search
A quantum circuit that takes a list of numbers and returns a quantum state which is a superposition of indices of those numbers that follow a given pattern
Language: Jupyter Notebook - Size: 919 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

sdurancmu/disaster_tweets
Multiple approaches to predicting disaster tweets on Kaggle dataset
Language: Jupyter Notebook - Size: 133 MB - Last synced at: 6 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 1

bartczernicki/Documents-Forms
Collection of various documents and forms that can be used by AI services & systems for training
Size: 26.2 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

bhattsahil1/smart-xtractor
Language: Python - Size: 3.45 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

roshni-b/Log-Parser
Modular log parser that parses @nasa's apache logs and processes them.
Language: Python - Size: 30.3 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

krishcy25/SentimentMining-UsingPython-WordCloud-and-TextHero
Sentiment Mining (Unstructured data)- This repository focuses on Creating a Word Cloud (with most frequent/significant words) and Created list of top words by product, K-Means and PCA plot for the reviews based on category of topics as pulled by the textual review analysis of Amazon Customer Reviews on Electronic Products
Language: Jupyter Notebook - Size: 3.85 MB - Last synced at: 11 months ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

as2leung/web_scrape_postal_office_address
A web scraping project that retrieves the post office locations from a search engine result and outputs the data in a cleaned dataframe
Language: Jupyter Notebook - Size: 35.2 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

rgdeekshith/zero-to-mastery-ml Fork of mrdbourke/zero-to-mastery-ml
All course materials for ZTM ML on Udemy
Size: 129 MB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

tejasshahu/Data_Science_Machine_Learning
This repository is all about Data Science and Machine Learning.
Language: Jupyter Notebook - Size: 33.7 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

wotchin/PostVector
PostVector: unstructured and vector retrieval database extension to PostgreSQL.
Size: 13.7 KB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

jaydeepdevda/NLP-AccessingTextData
Python code to access Large text ( At least 10 pages) from a .txt file, MS Word Document, PDF file, Wikipedia page, 500 tweets.
Language: HTML - Size: 750 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

rosette-api-community/rosette-for-docs
Google Docs add-on offering users the ability to extract entities, translate names, and research entities on wikipedia from within their multilingual document.
Language: JavaScript - Size: 18.6 KB - Last synced at: 5 months ago - Pushed at: about 9 years ago - Stars: 0 - Forks: 1
