Topic: "ingestion-pipeline"
bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Language: Python - Size: 170 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3,220 - Forks: 106

opensemanticsearch/open-semantic-etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Language: Python - Size: 615 KB - Last synced at: 4 months ago - Pushed at: almost 3 years ago - Stars: 268 - Forks: 72

AstraBert/ingest-anything
From data to vector database effortlessly
Language: Python - Size: 6.65 MB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 79 - Forks: 12

KnudsenMorten/AzLogDcrIngestPS
AzLogDcrIngestPS - Unleashing the power of Log Ingestion API with Azure LogAnalytics custom table v2, Azure Data Collection Rules and Azure Data Ingestion Pipeline
Language: PowerShell - Size: 23 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 31 - Forks: 0

Morphl-AI/MorphL-Model-User-Search-Intent
Google Cloud Storage connector, pre-processor and model for predicting user search intent based on keywords
Language: Python - Size: 70.3 KB - Last synced at: 4 months ago - Pushed at: almost 6 years ago - Stars: 25 - Forks: 4

Morphl-AI/MorphL-Model-Publishers-Churning-Users
Google Analytics connector, pre-processor and model for predicting churning users for digital publishers.
Language: Python - Size: 212 KB - Last synced at: 4 months ago - Pushed at: over 6 years ago - Stars: 10 - Forks: 6

Clarifai/clarifai-python-datautils
Extract Transform and Load unstructured data into the Clarifai's AI platform
Language: Python - Size: 1.01 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 6 - Forks: 0

akshaybahadur21/Emancipitaion-of-Apache-Spark
My experiments with Apache Spark for Humans ⭐
Language: Java - Size: 12.6 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 6

anhtuan284/chest-xray-multi-disease
Multi-disease segmentation chest X-rays by YOLO and DenseNet121, CoAtNet models
Language: Jupyter Notebook - Size: 114 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

tmcgrath/cassandra-ingest
DataStax or Cassandra Ingest from Relational Databases with StreamSets
Language: PLSQL - Size: 12.3 MB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 4 - Forks: 13

CyberCRI/welearn-datastack
Data stack for WeLearn LPI projects. This pipeline can collect, vectorize and store data from various sources.
Language: HTML - Size: 3.12 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 0

xinmiao14/opensky-flight-pipeline
Real-time flight data fetching, cleaning, and analytics API using FastAPI, Pandas, PostgreSQL, and Python.
Language: Python - Size: 1.12 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 0

azuregig/work_with_OrdnanceSurvey_data
Sample Azure Data Factory pipeline for ingesting Data Packages directly from the Download API of the Ordnance Survey Data Hub into Azure Storage.
Size: 2.21 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 3

siddharth271101/Stock-Exchange-Analysis
Created a data pipeline using sqoop to ingest data from sql server into the hive table and used hive for feature engineering and analysis.
Language: Shell - Size: 14.5 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

rachita27/AUTOMATING
Automating Ingestion Excel Files On To Azure Data Studio (SQL-Server)
Language: Jupyter Notebook - Size: 13.2 MB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

mbsuraj/postgresql_ingestion_script
Ingest any format data into postgreSQL database
Language: Python - Size: 15.6 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

rohitshubham/Cloud-pipeline
A real-life end-to-end cloud sub-system scenario
Language: Python - Size: 294 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

bari-data-dev/python-sql-datawarehouse-project
Building a Modern data warehouse with Python and SQL, including ETL Pre-processes (Python, ETL Processes (SQL), Data Modeling, Analytics, and Professional Documentations.
Language: Python - Size: 5.64 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

akram0zaki/breach-ingestor
A resilient, prefix-sharded ingestion pipeline for large static breach dumps (e.g. AntiPublic), optimized for low-resource environments (e.g., Raspberry Pi + NAS/SSD).
Language: JavaScript - Size: 46.9 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

garethcmurphy/SciCat-Data-Ingestion-with-TypeScript
# SciCat Data Ingestion with TypeScript 📥✨ This repository provides a **TypeScript-based tool** for importing and ingesting data into **SciCat**, the science data catalog used at the **European Spallation Source (ESS)**. --- ## Features ✨ - **Data Ingestion**: Automates data import into SciCat. - **TypeScript Implementation**: Ensures ty
Language: TypeScript - Size: 15.6 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

InspiredcL/data-science-on-gcp Fork of GoogleCloudPlatform/data-science-on-gcp
Código fuente: Análisis de Vuelos basado en trabajo de Valliappa Lakshmanan.
Language: Jupyter Notebook - Size: 12.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

amyth-singh/multinational-retail-data-centralisation
The multinational retail data contralisation project is a data warehousing project that focuses on ingesting data from disparate sources to create a centralised warehouse
Language: Python - Size: 981 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ds-rafaelfelippe/DataIngestionPython
Mini projeto desenvolvido no contexto da disciplina de Banco de Dados Não Relacional do programa de pós-graduação em Ciência de Dados e Machine Learning na PUC Campinas.
Language: Jupyter Notebook - Size: 1.06 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Charanaicore/multinational-retail-data-centralisation
The multinational retail data contralisation project is a data warehousing project that focuses on ingesting data from disparate sources to create a centralised warehouse
Language: Python - Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

alanzhaonys/workmail-intercepter-excel-to-csv
Transform incoming AWS WorkMail email with Excel attachment to CSV and save to S3 bucket
Language: Python - Size: 367 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Ludovik99/Analysis-of-Gas-Stations-with-Apache-Spark
Simulating a consultancy project for Repsol, the repository contains both the code notebook and the analysis.
Language: Jupyter Notebook - Size: 13 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Sqooba/mssql-to-avro-with-spark
Apache Spark example reading from MSSQL and converting in AVRO format.
Language: Java - Size: 9.77 KB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

vivek-bombatkar/Graph-Datastructure-for-Movielens-dataset
Language: Jupyter Notebook - Size: 726 KB - Last synced at: 3 months ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

fnldesign/crypthobot-ingestion
A cryptho currency automated bot
Size: 14.6 KB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0
