Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: unstructured-data

Zipstack/unstract-adapters

Unstract's interface to LLMs, Embeddings and VectorDBs.

Language: Python - Size: 585 KB - Last synced: about 1 hour ago - Pushed: about 6 hours ago - Stars: 9 - Forks: 1

instill-ai/pipeline-backend

โ‡‹ A REST/gRPC server for Instill VDP API service

Language: Go - Size: 6.08 MB - Last synced: 30 minutes ago - Pushed: about 3 hours ago - Stars: 15 - Forks: 8

Renumics/spotlight

Interactively explore unstructured datasets from your dataframe.

Language: TypeScript - Size: 45.7 MB - Last synced: about 7 hours ago - Pushed: about 8 hours ago - Stars: 1,016 - Forks: 82

konhay/sector-attention-index

Specifically built for the research proposal: Estimating sector attention index with deep learning methods : example of Chinese stock market, Jan. 4, 2024.

Language: Python - Size: 864 KB - Last synced: about 12 hours ago - Pushed: about 13 hours ago - Stars: 1 - Forks: 0

milvus-io/bootcamp

Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.

Language: HTML - Size: 165 MB - Last synced: about 12 hours ago - Pushed: about 12 hours ago - Stars: 1,639 - Forks: 539

kodexa-ai/kodexa

Kodexa Python Client

Language: Python - Size: 10.3 MB - Last synced: 6 days ago - Pushed: 8 days ago - Stars: 3 - Forks: 1

tstanislawek/awesome-document-understanding

A curated list of resources for Document Understanding (DU) topic

Size: 5.56 MB - Last synced: about 23 hours ago - Pushed: 12 months ago - Stars: 1,131 - Forks: 133

Zipstack/unstract

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

Language: Python - Size: 6.98 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 79 - Forks: 8

towhee-io/towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Language: Python - Size: 37.2 MB - Last synced: 6 days ago - Pushed: 4 months ago - Stars: 3,009 - Forks: 238

instill-ai/helm-charts

โŽˆ The Helm charts of Instill AI

Size: 146 KB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 2 - Forks: 1

instill-ai/mgmt-backend

โ‡‹ A REST/gRPC server for Instill AI's Management API service

Language: Go - Size: 1.06 MB - Last synced: 13 days ago - Pushed: 15 days ago - Stars: 0 - Forks: 2

esteininger/file-processor

A Python library that uses AI to convert unstructured files (like PDFs, HTML, etc.) into structured data.

Language: Python - Size: 114 KB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 0

instill-ai/controller-model

๐ŸŽฎ A controller-model manages components in Instill Model

Language: Go - Size: 347 KB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 1

NTDLS/NTDLS.Katzebase.Server

ACID compliant JSON document-based database engine with SQL language, APIs and GUI.

Language: C# - Size: 29.2 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 4 - Forks: 1

RelevanceAI/relevanceai

Home of the AI workforce - Multi-agent system, AI agents & tools

Language: Python - Size: 68.2 MB - Last synced: 9 days ago - Pushed: 3 months ago - Stars: 100 - Forks: 17

instill-ai/cli

๐Ÿ“บ Instill AI's official command line tool

Language: Go - Size: 678 KB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 21 - Forks: 3

instill-ai/artifact-backend

โ‡‹ A REST/gRPC server for Instill Artifact API service

Language: Go - Size: 184 KB - Last synced: 13 days ago - Pushed: 20 days ago - Stars: 0 - Forks: 0

alexandreLamarre/Fission

Data analytics & Structured streaming optimized for the Edge

Language: Rust - Size: 31.3 KB - Last synced: 11 days ago - Pushed: 12 days ago - Stars: 1 - Forks: 0

NityaVerma19/Cats-vs-Dogs

Classifying ๐Ÿ˜บ and ๐Ÿถ using CNN

Language: Jupyter Notebook - Size: 2.85 MB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 0 - Forks: 0

instill-ai/instill-core

๐Ÿ”ฎ Instill Core is an open-source no-/low-code data, model, and pipeline orchestration platform

Language: Makefile - Size: 8.92 MB - Last synced: 13 days ago - Pushed: 18 days ago - Stars: 1,875 - Forks: 80

instill-ai/console

โ›… Versatile Data Pipeline (VDP) console website

Language: TypeScript - Size: 7.64 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 25 - Forks: 9

instill-ai/model-backend

โ‡‹ A REST/gRPC server for Instill Model API service

Language: JavaScript - Size: 7.28 MB - Last synced: 13 days ago - Pushed: 20 days ago - Stars: 14 - Forks: 6

instill-ai/deprecated-model

โš—๏ธ Instill Model contains components for AI model orchestration

Language: Makefile - Size: 6.06 MB - Last synced: 13 days ago - Pushed: about 2 months ago - Stars: 20 - Forks: 4

instill-ai/deprecated-core

๐Ÿ”ฎ Instill Core contains components for supporting Instill VDP and Instill Model

Language: Makefile - Size: 1.25 MB - Last synced: 13 days ago - Pushed: 3 months ago - Stars: 13 - Forks: 4

instill-ai/controller ๐Ÿ“ฆ

๐ŸŽฎ A controller to management all VDP states

Language: Go - Size: 281 KB - Last synced: 13 days ago - Pushed: 11 months ago - Stars: 0 - Forks: 1

instill-ai/connector-backend ๐Ÿ“ฆ

โ‡‹ A REST/gRPC server for Instill AI's data connector service

Language: JavaScript - Size: 1.63 MB - Last synced: 13 days ago - Pushed: 6 months ago - Stars: 3 - Forks: 3

instill-ai/.github

๐Ÿก Instill AI organisation profile and default configuration

Size: 50.8 MB - Last synced: 13 days ago - Pushed: 2 months ago - Stars: 1 - Forks: 1

elalbaicin/progRchives

An R package for scraping and organizing ProgArchives data.

Language: R - Size: 3.49 MB - Last synced: 14 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

Menziess/Databook

Data Engineering knowledge as a readable tutorial (collaboratively).

Size: 2.44 MB - Last synced: 16 days ago - Pushed: over 5 years ago - Stars: 4 - Forks: 1

garyelephant/pygrok

python implementation of jordansissel's grok regular expression library

Language: Python - Size: 66.4 KB - Last synced: 8 days ago - Pushed: 6 months ago - Stars: 273 - Forks: 76

kodexa-ai/kodexa-cli

Command Line Tools for Kodexa

Language: Python - Size: 918 KB - Last synced: 6 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 1

Zipstack/unstract-sdk

A framework for writing Unstract Tools/Apps

Language: Python - Size: 1.69 MB - Last synced: 22 days ago - Pushed: 22 days ago - Stars: 5 - Forks: 0

perebaj/parser

Parse Unstructure text using GPT3 API

Language: Go - Size: 1.75 MB - Last synced: 24 days ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

jovezhong/real-time-milvus Fork of bytewax/real-time-milvus

Streaming meets LLM: Real-time Hacker News to Milvus/Zilliz with streaming SQL

Language: Python - Size: 2.27 MB - Last synced: 26 days ago - Pushed: 27 days ago - Stars: 0 - Forks: 0

lilacai/lilac

Curate better data for LLMs

Language: Python - Size: 37 MB - Last synced: 25 days ago - Pushed: about 2 months ago - Stars: 814 - Forks: 68

nomic-ai/nomic

Interact, analyze and structure massive text, image, embedding, audio and video datasets

Language: Python - Size: 23.8 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 984 - Forks: 134

nuclia/nucliadb

NucliaDB, The AI Search database for RAG

Language: Python - Size: 34 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 569 - Forks: 45

dingodb/dingo

A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.

Language: Java - Size: 19.6 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 315 - Forks: 110

voxel51/fiftyone

The open-source tool for building high-quality datasets and computer vision models

Language: Python - Size: 1.29 GB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 6,627 - Forks: 487

DataCanvasIO/dingo Fork of dingodb/dingo

A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.

Language: Java - Size: 18.9 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 7 - Forks: 2

IBM/generate-insights-from-data-formats-with-watson

How do we process data in different formats like docx, pdf etc and generate insights to be linked with structured data in database?This pattern helps in establishing relations between structured & unstructured data to generate recommendations using Watson NLU & Watson Studio.

Language: Jupyter Notebook - Size: 1.06 MB - Last synced: about 1 month ago - Pushed: almost 4 years ago - Stars: 13 - Forks: 16

yeisonmontoya1815/Special-Topics-in-Data-Analytics-CSIS-4260-002

In my PDD Data Analytics studies at Douglas College, the Special Topics course stands out as a crucial component. This specialized module delves into advanced aspects of data analysis beyond the core curriculum, offering a deep exploration of intricate domains. Through this focused study, I aim to enhance my proficiency in handling complex datasets

Language: Jupyter Notebook - Size: 14.3 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0

rudrakshsyal/Craigslist-Job-Listing-Transformation-via-Text-Modeling

Improved quality and presentation of job listings on Craigslist website via scraping and training data from Indeedโ€™s job listingsโ€™, to enhance user experience, drive more traffic and thus increase revenue

Language: Jupyter Notebook - Size: 4.54 MB - Last synced: 2 months ago - Pushed: over 5 years ago - Stars: 1 - Forks: 0

EulerSearch/embedding_studio

Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.

Language: Python - Size: 10.2 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 355 - Forks: 4

drci-foch/BTB_extraction

Transbronchial Biopsy Document restructuration. Work in progress.

Language: Jupyter Notebook - Size: 93.5 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

IBM/pixiedust-facebook-analysis ๐Ÿ“ฆ

A Jupyter notebook that uses the Watson Visual Recognition and Natural Language Understanding services to enrich Facebook Analytics and uses Cognos Dashboard Embedded to explore and visualize the results in Watson Studio

Language: Jupyter Notebook - Size: 7.99 MB - Last synced: about 1 month ago - Pushed: over 3 years ago - Stars: 43 - Forks: 64

instill-ai/deprecated-vdp

๐Ÿ’ง Instill VDP (Versatile Data Pipeline) is an open-source tool to seamlessly integrate AI to process unstructured data in the modern data stack

Language: Makefile - Size: 7.9 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

NTDLS/NTDLS.Katzebase.SQLServerMigration

Tool for exporting data from SQL Server to Katzebase server. Katzebase is a ACID compliant JSON document-based database engine with SQL language, APIs and GUI.

Language: C# - Size: 9.04 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

BartJongejan/Bracmat

Programming language for symbolic computation with unusual combination of pattern matching features: Tree patterns, associative patterns and expressions embedded in patterns.

Language: C - Size: 22.5 MB - Last synced: 2 months ago - Pushed: 3 months ago - Stars: 47 - Forks: 6

instill-ai/controller-vdp ๐Ÿ“ฆ

๐ŸŽฎ A controller-vdp manages components in Instill VDP

Language: Go - Size: 316 KB - Last synced: 13 days ago - Pushed: 5 months ago - Stars: 0 - Forks: 1

hupe1980/go-textractor

๐Ÿ“„ Amazon textract response parser written in go.

Language: Go - Size: 6.24 MB - Last synced: 16 days ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

martinbatek/IC-UDA-Final-Project

Final Project for the Unstructured Data Analysis module in the MSc. Machine Learning and Data Science Course

Language: Jupyter Notebook - Size: 500 MB - Last synced: 3 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

mazzasaverio/terra-text-processor

A Terraform setup for processing unstructured data on GCP with MongoDB Atlas and Confluent Kafka, featuring serverless, event-driven architecture and Cloud Run integrations.

Language: HCL - Size: 14.6 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

DrShreyan/LLM-Chatbot-Models

LLM Models on Unstructured Data

Language: Python - Size: 6.84 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

oypark/Unstructured-data-analysis-Project

๋ฉ€์บ  ํ”„๋กœ์ ํŠธ2_๋น„์ •ํ˜• ๋ฐ์ดํ„ฐ ๋ถ„์„(mulcam bigdata project2_unstructured data analysis)

Language: Jupyter Notebook - Size: 19.6 MB - Last synced: 5 months ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

aclai-lab/SoleData.jl

Manage unstructured and multimodal datasets!

Language: Julia - Size: 1.11 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 11 - Forks: 0

kodexa-ai/kodexa-java

Kodexa Content Model and Client for Java

Language: Java - Size: 18.3 MB - Last synced: 5 months ago - Pushed: 6 months ago - Stars: 0 - Forks: 1

mkirslis/Warship-Data

Generates a CSV file of warship data from Wikipedia.

Language: Python - Size: 155 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 1 - Forks: 0

TuanaCelik/unstructuredio-haystack

๐Ÿ’™ Unstructured Data Connectors for Haystack 2.0

Language: Python - Size: 22.5 KB - Last synced: 5 months ago - Pushed: 8 months ago - Stars: 14 - Forks: 0

automorphic-ai/trex

Intelligently transform unstructured to structured data

Language: Python - Size: 36.1 KB - Last synced: 6 months ago - Pushed: 8 months ago - Stars: 215 - Forks: 9

ClaudioPoli/JobAds

Management of structured and unstructured data

Language: PLpgSQL - Size: 30.3 KB - Last synced: 7 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

KamRoki/Deep-Learning-Dog-Breed

Who's a good dog? Who likes ear scratches? Well, it seems those fancy deep neural networks don't have all the answers. However, maybe they can answer that ubiquitous question we all ask when meeting a four-legged stranger: what kind of good pup is that? This notebook builds a multi-class image classifier using TensorFlow 2.0 and TensorFlow Hub.

Language: Jupyter Notebook - Size: 6.1 MB - Last synced: 4 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 1

yrnigam/Named-Entity-Recognition-NER-using-LSTMs

Named Entity Recognition (NER) using LSTMs with Keras

Language: Jupyter Notebook - Size: 3.78 MB - Last synced: 8 months ago - Pushed: almost 4 years ago - Stars: 3 - Forks: 6

janoellerich/RooTri

Language: MATLAB - Size: 124 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

Mihryam/HealthNews_Tweets-ClusteringToClassification

A machine learning model on clustering of health news tweets from different news sources to extrapolate categories and then use the cluster labels for downstream classification.

Language: Jupyter Notebook - Size: 4.45 MB - Last synced: 8 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

mkearney/wibble

Web Data Frames

Language: R - Size: 497 KB - Last synced: 3 months ago - Pushed: about 5 years ago - Stars: 12 - Forks: 0

faisalman/re-parse-js

Compose a structured data from unstructured text using regex-based pattern matching

Language: TypeScript - Size: 21.5 KB - Last synced: 12 days ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

ash-0521/Abandoned-Object-Detection-in-crowded-environment-using-MATLAB

Trained MATLAB models for 82% precision/80% recall, optimized with blob analysis for 25% performance boost. User-friendly alarm system with 500+ engaged users.

Size: 682 KB - Last synced: 4 months ago - Pushed: 9 months ago - Stars: 1 - Forks: 0

inuwamobarak/detecting-tables-in-documents

This repository contains code and resources for detecting tables in various types of documents using machine learning and computer vision techniques.

Language: Jupyter Notebook - Size: 1.8 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

saranpal/Spark-RDD-Set-Top-Box-Data-Analysis

Spark RDD transformation and action, process unstructured data

Language: Scala - Size: 654 KB - Last synced: 9 months ago - Pushed: over 5 years ago - Stars: 3 - Forks: 3

SAP-samples/hana-structurer-one

SAP HANA Extreme application that analyzes unstructured data (tweets) to retrieve information such as location, people, companies, and also sentiment analysis.

Language: CSS - Size: 3.81 MB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 4

ttariqaziz/statistical_modeling_matlab

Highlights of my research work in MATLAB, statistical modeling of the unstructured raw data from GPS satellites for several years. Data modeling and processing, followed by different residual plots including trends and root mean square. In the end, the result was compared with independent data set models for validation purposes. The results were also presented at a European conference.

Size: 10.8 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 1 - Forks: 0

abdollahpour/micro-draft-manager

micro-draft-manager is a microservice that helps you to manage unstructured data in your application with sorting and full-text search

Language: Go - Size: 27.3 KB - Last synced: 10 months ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

jaydeepdevda/NLP-AccessingTextData

Python code to access Large text ( At least 10 pages) from a .txt file, MS Word Document, PDF file, Wikipedia page, 500 tweets.

Language: HTML - Size: 750 KB - Last synced: 10 months ago - Pushed: over 5 years ago - Stars: 0 - Forks: 1

aws-samples/content-repository-with-multilingual-search

Code and walkthrough to build an end-to-end content repository for unstructured data with multilingual semantic search and dynamic access control.

Language: TypeScript - Size: 3.18 MB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 2

instill-ai/metric-backend ๐Ÿ“ฆ

โ‡‹ A REST/gRPC server for Instill AI's Metric API service

Size: 0 Bytes - Last synced: 13 days ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

aws-samples/content-repository-with-dynamic-access-control

Code and walkthrough to build an end-to-end content repository for unstructured data with dynamic access control.

Language: TypeScript - Size: 1000 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 5 - Forks: 1

adansons/base

Adansons Base is a data programming tool for error-analysis of training results. It organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps to find low-quality data by using the training results and improves AI performance.

Language: Jupyter Notebook - Size: 12.8 MB - Last synced: 18 days ago - Pushed: almost 2 years ago - Stars: 28 - Forks: 3

SachinKalsi/html_tag_annotator

A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension

Language: JavaScript - Size: 11.8 MB - Last synced: 12 months ago - Pushed: over 1 year ago - Stars: 11 - Forks: 2

MoinDalvs/Resume_Screening_and_Parser

Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention Sample Data Set Details: Resumes and financial documents

Language: Jupyter Notebook - Size: 95.9 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 3 - Forks: 1

boomalope/ltb

Code for my working paper: The Winners and Losers of Rental Tribunals (February 14, 2022). Available at SSRN: https://ssrn.com/abstract=4029114

Language: HTML - Size: 69.7 MB - Last synced: 6 months ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0

chaitjo/knowledge-graphs

Building Knowledge Graphs from Unstructured Text

Language: Jupyter Notebook - Size: 42.9 MB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 11 - Forks: 6

pedrogfleming/Snowflake-Scripts

SQL Scripts related to my learning on the Snowflake data cloud provider

Size: 3.7 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

ujunwa-DS/UNSTRUCTURED-DATA-WHATSAPP-DATA-

WhatsApp Unstructured data was cleaned with python and visualized with Power BI to obtain insight. Libraries like Numpy, Regex, openpyxl, pandas were used in this project

Language: Jupyter Notebook - Size: 209 KB - Last synced: 9 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

thu-west/AnnotationTool

An Annotation Tool Designed for Health Unstructured Data (ๆ ‡ๆณจๅทฅๅ…ท)

Language: Java - Size: 13.8 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 5 - Forks: 4

pradeepdev-1995/Index-based-semantic-similarity-unstructured-data-search

Unstructured data refers to information that is not organised using a predetermined data model or schema and cannot be stored in a conventional relational database system. There are several methods for search unstructured data semantically- That is by taking the actual context/meaning of the sentences.One best approach is index based approach.

Language: Jupyter Notebook - Size: 249 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

kuyio/infozilla

The infoZilla unstructured software engineering data mining tool. It can find and extract source code regions, patches, stack traces, enumerations and itemizations from discussion threads.

Language: Java - Size: 530 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 13 - Forks: 1

adisorbo/NEON_tool

NEON mines rules for detecting natural language patterns in software informal documents. The inferred rules can be used for identifying and extracting relevant information embedded in unstructured texts.

Language: Java - Size: 68.8 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 1 - Forks: 5

jostmey/dkm

Dynamic Kernel Matching (DKM) for Classifying Data with Non-conforming Features

Language: HTML - Size: 7.15 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 94 - Forks: 5

maithilish/gotz

Gotz - Heavy duty ETL to automate data extraction from tons of HTML pages

Language: Java - Size: 1.41 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 8 - Forks: 0

MohitWani/Unstructured-data-preprocessing-

This repository contain preprocessing of Unstructured data, Like Images, text, speech and etc....

Language: Jupyter Notebook - Size: 1.76 MB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

Peteresis/Movies-ETL

ETL (Extract, Transform, Load) Practice. Automate the process of reading new data, processing it, and then loading it into new SQL tables. The code uses Python, RegEx, and a SQL database to build an ETL pipeline for this project.

Language: Jupyter Notebook - Size: 2.99 MB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

wotchin/PostVector

PostVector: unstructured and vector retrieval database extension to PostgreSQL.

Size: 13.7 KB - Last synced: about 1 year ago - Pushed: almost 5 years ago - Stars: 0 - Forks: 0

lilianchi/lost-or-found

A repository with our team's final Python project in MGMT 590 Analyzing Unstructured Data course at Krannert School of Management, Purdue University.

Language: Python - Size: 1.44 MB - Last synced: 6 months ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

bengruher/SMS-Spam-Detection

Machine learning task to identify spam SMS messages. Project involves processing of noisy unstructured text and other NLP techniques.

Language: Jupyter Notebook - Size: 663 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 1 - Forks: 1

AsishMandoi/quantum-search

A quantum circuit that takes a list of numbers and returns a quantum state which is a superposition of indices of those numbers that follow a given pattern

Language: Jupyter Notebook - Size: 919 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

bartczernicki/Documents-Forms

Collection of various documents and forms that can be used by AI services & systems for training

Size: 26.2 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0

as2leung/web_scrape_postal_office_address

A web scraping project that retrieves the post office locations from a search engine result and outputs the data in a cleaned dataframe

Language: Jupyter Notebook - Size: 35.2 KB - Last synced: almost 1 year ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

malfusion/sentiment-keyword-extraction

Multi-Pipeline Keyword Extractor and Word Cloud Visualizer for Sentiment Analysis tasks

Language: Java - Size: 6.76 MB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0

roshni-b/Log-Parser

Modular log parser that parses @nasa's apache logs and processes them.

Language: Python - Size: 30.3 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0