Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: data-pipelines
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
Language: Python - Size: 952 MB - Last synced: about 3 hours ago - Pushed: about 4 hours ago - Stars: 10,320 - Forks: 1,283
unicef/magasin
Cloud native open-source end-to-end data / AI / ML platform
Language: Mustache - Size: 18.6 MB - Last synced: about 4 hours ago - Pushed: about 1 month ago - Stars: 4 - Forks: 2
dataform-co/dataform
Dataform is a framework for managing SQL based data operations in BigQuery
Language: TypeScript - Size: 15.7 MB - Last synced: about 3 hours ago - Pushed: about 4 hours ago - Stars: 793 - Forks: 146
mycelial/mycelial
Move your data with ease.
Language: Rust - Size: 1.51 MB - Last synced: about 1 hour ago - Pushed: about 5 hours ago - Stars: 70 - Forks: 9
brunocampos01/data-engineering
Language: Python - Size: 165 MB - Last synced: about 15 hours ago - Pushed: about 15 hours ago - Stars: 11 - Forks: 2
artie-labs/transfer
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
Language: Go - Size: 11.3 MB - Last synced: about 21 hours ago - Pushed: about 21 hours ago - Stars: 536 - Forks: 24
elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Language: HTML - Size: 192 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 1,725 - Forks: 144
tuva-health/tuva
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Size: 23.2 MB - Last synced: about 21 hours ago - Pushed: about 22 hours ago - Stars: 153 - Forks: 30
terrytangyuan/awesome-kubeflow
A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)
Size: 234 KB - Last synced: 3 days ago - Pushed: 13 days ago - Stars: 181 - Forks: 15
infinyon/fluvio
Lean and mean distributed stream processing system written in rust and web assembly.
Language: Rust - Size: 22.9 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 2,284 - Forks: 166
AnthonyByansi/Airflow-Data-Pipeline-Automation
Automate your data pipelines using Apache Airflow with this ready-to-use DAG for data integration, ETL and workflow automation.
Language: Python - Size: 15.6 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 7 - Forks: 0
Galileo-Galilei/kedro-pandera
A kedro plugin to use pandera in your kedro projects
Language: Python - Size: 213 KB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 30 - Forks: 2
apicrafter/datacrafter
NoSQL extract, transform, load (ETL) toolkit with Python
Language: Python - Size: 453 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 11 - Forks: 3
bruin-data/bruin
Bruin is a data pipeline tool that is designed to be easy-to-use. It allows building data pipelines using SQL and Python, and has built-in data quality checks.
Language: Go - Size: 22.8 MB - Last synced: 27 days ago - Pushed: 28 days ago - Stars: 46 - Forks: 1
meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Language: Python - Size: 135 MB - Last synced: 9 days ago - Pushed: 10 days ago - Stars: 1,598 - Forks: 143
conductor-sdk/conductor-python
Conductor OSS SDK for Python programming language
Language: Python - Size: 1.28 MB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 50 - Forks: 25
goto/optimus Fork of raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Language: Go - Size: 26.3 MB - Last synced: about 10 hours ago - Pushed: 1 day ago - Stars: 3 - Forks: 1
rafaelvargas/bytebridge
A data tool designed to move data seamlessly between various sources and destinations.
Language: Python - Size: 46.9 KB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 1
aquemy/DOLAP_2019_supplementary_material
Supplementary material for DOLAP 2019 submission
Size: 5.04 MB - Last synced: 9 days ago - Pushed: over 5 years ago - Stars: 1 - Forks: 0
cybergeekgyan/Data-Engineering-Portfolio
Data Engineering portfolio projects, resources used to study data tools...
Language: Jupyter Notebook - Size: 2.92 MB - Last synced: 9 days ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0
dataflint/spark
Performance Observability for Apache Spark
Language: TypeScript - Size: 18.6 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 125 - Forks: 9
data-engineering-community/data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data engineering community.
Language: CSS - Size: 7.59 MB - Last synced: 9 days ago - Pushed: about 1 month ago - Stars: 1,032 - Forks: 103
kevin-hanselman/dud
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
Language: Go - Size: 3.31 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 166 - Forks: 6
recap-build/recap
Work with your web service, database, and streaming schemas in a single format.
Language: Python - Size: 1.41 MB - Last synced: 7 days ago - Pushed: about 1 month ago - Stars: 306 - Forks: 24
infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Language: Python - Size: 19.4 MB - Last synced: 13 days ago - Pushed: 14 days ago - Stars: 5,952 - Forks: 499
KayvanShah1/usc-dsci560-dspp-sp24
USC DSCI 560 - Data Science Professional Practicum - Spring 2024 - Prof. Young Cho
Language: Python - Size: 50.1 MB - Last synced: 12 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0
mage-ai/mage-ai
๐ง Build, run, and manage data pipelines for integrating and transforming data.
Language: Python - Size: 170 MB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 6,940 - Forks: 616
dataplane-app/dataplane
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
Language: JavaScript - Size: 274 MB - Last synced: 10 days ago - Pushed: 4 months ago - Stars: 184 - Forks: 30
kiwicom/terraform-provider-montecarlo
This open-source Terraform provider enables users to seamlessly integrate the Monte Carlo data reliabillity platform into their infrastructure as a code (IaC) workflows.
Language: Go - Size: 230 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 8 - Forks: 0
apache/dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Language: Java - Size: 199 MB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 11,997 - Forks: 4,411
tuva-health/tuva_demo
A starter dbt project and synthetic claims dataset for trying out the Tuva Project.
Size: 1.98 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 12 - Forks: 6
AiDAPT-A/VisArchPy
pipelines for the extraction and processing of visuals from PDFs
Language: Python - Size: 3.78 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 3 - Forks: 1
kestra-io/examples
Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services
Language: HCL - Size: 1.92 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 9 - Forks: 3
BogdanFloris/detecting-and-addressing-change
Code for my Master Thesis: How to detect and address changes in machine learning based data pipelines
Language: Python - Size: 151 KB - Last synced: 17 days ago - Pushed: 10 months ago - Stars: 3 - Forks: 0
giacbrd/SmartPipeline
A framework for rapid development of robust data pipelines following a simple design pattern
Language: Python - Size: 393 KB - Last synced: 14 days ago - Pushed: 2 months ago - Stars: 22 - Forks: 2
mpolinowski/apache-airflow-intro
Introduction to Apache Airflow
Language: Python - Size: 9.77 KB - Last synced: 20 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
CofluxLabs/coflux
Open-source workflow engine. Orchestrate and observe computational workflows defined in plain Python. Suitable for data pipelines, background tasks, chat bots.
Language: Elixir - Size: 3.61 MB - Last synced: 17 days ago - Pushed: 20 days ago - Stars: 4 - Forks: 0
DidactHQ/didact
The open source, standalone, fullstack .NET job orchestrator that we've been missing.
Size: 14.6 KB - Last synced: 22 days ago - Pushed: 6 months ago - Stars: 30 - Forks: 0
elementary-data/dbt-data-reliability
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Language: Python - Size: 7.47 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 338 - Forks: 76
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Language: HTML - Size: 124 MB - Last synced: 24 days ago - Pushed: 25 days ago - Stars: 5,819 - Forks: 424
opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Language: Java - Size: 28.1 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 1,104 - Forks: 91
tsdat/tsdat
Time series data utilities for declaratively applying standardization, Q/C, and transformations to datastreams.
Language: Python - Size: 144 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 11 - Forks: 7
CogStack/CogStack-NiFi
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Language: Python - Size: 74.9 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 31 - Forks: 16
vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
Language: Python - Size: 109 MB - Last synced: 27 days ago - Pushed: 28 days ago - Stars: 409 - Forks: 54
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Language: Python - Size: 264 MB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 34,343 - Forks: 13,504
SciPhi-AI/R2R
The framework for fast development and deployment of RAG backends.
Language: Python - Size: 19.5 MB - Last synced: 30 days ago - Pushed: 30 days ago - Stars: 1,103 - Forks: 92
smart-data-lake/smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Language: Scala - Size: 36.2 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 92 - Forks: 21
orchest/orchest
Build data pipelines, the easy way ๐ ๏ธ
Language: TypeScript - Size: 27.2 MB - Last synced: 26 days ago - Pushed: 11 months ago - Stars: 4,019 - Forks: 251
bakdata/streams-explorer
Explore Apache Kafka data pipelines in Kubernetes.
Language: Python - Size: 3.87 MB - Last synced: 2 days ago - Pushed: about 1 month ago - Stars: 44 - Forks: 4
linkedin/Hoptimator
Multi-hop declarative data pipelines
Language: Java - Size: 332 KB - Last synced: 25 days ago - Pushed: about 1 month ago - Stars: 74 - Forks: 12
combust/mleap
MLeap: Deploy ML Pipelines to Production
Language: Scala - Size: 3.32 MB - Last synced: 9 days ago - Pushed: 6 months ago - Stars: 1,494 - Forks: 313
beneath-hq/beneath
Beneath is a serverless real-time data platform โก๏ธ
Language: Go - Size: 11 MB - Last synced: 18 days ago - Pushed: about 2 years ago - Stars: 81 - Forks: 9
DidactHQ/didact-engine
The REST API and execution engine for the Didact Platform.
Language: C# - Size: 238 KB - Last synced: 22 days ago - Pushed: about 2 months ago - Stars: 44 - Forks: 0
glassflow/cli
GlassFlow CLI to create and manage data pipelines
Language: Shell - Size: 20.5 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 6 - Forks: 0
DidactHQ/didact-ui
The VueJS single-page app dashboard for the Didact Platform.
Language: Vue - Size: 764 KB - Last synced: 22 days ago - Pushed: 25 days ago - Stars: 11 - Forks: 0
GoogleCloudPlatform/public-datasets-pipelines
Cloud-native, data onboarding architecture for Google Cloud Datasets
Language: Python - Size: 7.12 MB - Last synced: 23 days ago - Pushed: 24 days ago - Stars: 136 - Forks: 61
iesahin/xvc
A robust (๐ข) and fast (๐) MLOps tool for managing data and pipelines in Rust (๐ฆ)
Language: Rust - Size: 5.12 MB - Last synced: about 23 hours ago - Pushed: 1 day ago - Stars: 22 - Forks: 0
fmind/mlops-python-package
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Language: Jupyter Notebook - Size: 1.26 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 206 - Forks: 24
srenegado/paintings-data
A Python ETL pipeline with a Postgres data warehouse for modeling art inventory.
Language: Python - Size: 528 KB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 0 - Forks: 0
DataCater/datacater ๐ฆ
The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafkaยฎ and Kubernetesยฎ.
Language: JavaScript - Size: 4.08 MB - Last synced: 17 days ago - Pushed: 9 months ago - Stars: 81 - Forks: 3
datajoint/datajoint-python
Relational data pipelines for the science lab
Language: Python - Size: 16.1 MB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 161 - Forks: 82
marcio-azevedo/fsharp-data-processing-pipeline
Provides an extensible solution for creating Data Processing Pipelines in F#.
Language: F# - Size: 352 KB - Last synced: 12 days ago - Pushed: about 6 years ago - Stars: 15 - Forks: 1
AnanthaRajuC/DataPractitioner
Data Practitioner
Language: Python - Size: 1010 KB - Last synced: 24 days ago - Pushed: 2 months ago - Stars: 3 - Forks: 0
koolreport/core
An Open Source PHP Reporting Framework that helps you to write perfect data reports or to construct awesome dashboards in PHP. Working great with all PHP versions from 5.6 to latest 8.0. Fully compatible with all kinds of MVC frameworks like Laravel, CodeIgniter, Symfony.
Language: PHP - Size: 2.56 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 151 - Forks: 34
Multiwoven/multiwoven-server
The backend control-plane for multiwoven, Built using Ruby on Rails & Temporal.
Language: Ruby - Size: 8.2 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 11 - Forks: 4
todofixthis/filters
๐ค What if we took the UNIX philosophy and applied it to input validation?
Language: Python - Size: 553 KB - Last synced: 1 day ago - Pushed: 7 months ago - Stars: 1 - Forks: 3
rcgsheffield/airbods
AIRBODS data pipelines and storage
Language: Python - Size: 262 KB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
tuva-health/FHIR_inferno
Connector that loads FHIR r4 USCDIv3 JSON data from local file storage into the Tuva common data model in Snowflake.
Language: Python - Size: 82 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 13 - Forks: 7
raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Language: Go - Size: 13.2 MB - Last synced: 2 months ago - Pushed: 6 months ago - Stars: 735 - Forks: 153
KyleZrey/data-pipeline
Creation of data pipeline using Jupyter Notebook, PostgreSQL, and Apache Airflow.
Language: Jupyter Notebook - Size: 9.74 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
Snehil-Shah/Seismic-Alerts-Streamer
A Realtime Seismic Logging & Alerts Service with Live Monitoring & Email Alerts made using Kafka Data Pipelines, all Dockerized & Deployment Ready!
Language: Java - Size: 11.1 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 2 - Forks: 0
arakat-community/arakat ๐ฆ
ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
Language: Python - Size: 31.6 MB - Last synced: 2 months ago - Pushed: almost 3 years ago - Stars: 26 - Forks: 21
mackelab/epiphyte
Python toolkit for working with high-dimensional neural data recorded during naturalistic, continuous stimuli @a-darcher @rachrapp
Language: Jupyter Notebook - Size: 191 MB - Last synced: about 1 month ago - Pushed: 2 months ago - Stars: 3 - Forks: 1
leotech-dev/leoflow
A set of plugins (mappers, sinks, etc.) for Numaflow pipelines
Language: Go - Size: 11.7 KB - Last synced: 3 months ago - Pushed: 5 months ago - Stars: 2 - Forks: 0
allamiro/Data-Pipelines
Every thing about designing installing and implementing data pipelines to include kafka zookeeper hadoop If you enjoy my content please consider supporting what I do Thank you.
Language: Jinja - Size: 4.45 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 1 - Forks: 0
itsame-mcl/data-pypeline
Pure Python 3 data wrangling tools with support for pipelines
Language: Python - Size: 24.9 MB - Last synced: 3 months ago - Pushed: about 1 year ago - Stars: 2 - Forks: 1
DataDrivenGit/Music-Streaming-App-using-AWS-ETL
Implemented Data Warehouse, Data Lake on AWS and Data modeling with Postgres and Apache Cassandra, Also used Apache Airflow to create data pipeline
Language: Jupyter Notebook - Size: 725 KB - Last synced: about 1 month ago - Pushed: almost 4 years ago - Stars: 4 - Forks: 3
jmoussa/go-sentitweet
CLI Application holding a sentiment analysis data (Twitter tweets) pipeline with its own Web API to query results in the database. Written entirely in Go.
Language: Go - Size: 13.4 MB - Last synced: 4 months ago - Pushed: about 2 years ago - Stars: 1 - Forks: 1
zkan/introduction-to-data-pipelines-and-apache-airflow
Introduction to Data Pipelines and Apache Airflow
Language: Python - Size: 134 KB - Last synced: 26 days ago - Pushed: about 2 months ago - Stars: 3 - Forks: 9
Sibusiso-Gumede/supermarket-scraper
A data extraction program that is a component of a ETL data pipeline. The program scrapes product promotion data from supermarket websites.
Language: Python - Size: 465 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
patterns-app/patterns-devkit
Data pipelines from re-usable components
Language: Python - Size: 1.75 MB - Last synced: 28 days ago - Pushed: about 1 year ago - Stars: 106 - Forks: 5
mxagar/data_engineering_guide
Personal notes on the IBM Data Engineering Certificate as well as other sources focusing on AWS.
Size: 2.93 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
mdh266/AirflowDataPipeline
Example of an ETL Pipeline using Airflow
Language: Python - Size: 14.6 KB - Last synced: 24 days ago - Pushed: over 6 years ago - Stars: 31 - Forks: 19
MattTriano/analytics_data_where_house
An analytics engineering sandbox focusing on real estates prices in Cook County, IL
Language: Python - Size: 15.7 MB - Last synced: 4 months ago - Pushed: 7 months ago - Stars: 7 - Forks: 0
thecodemancer/Apache-Beam
๐ฅ๐จโ๐ป Build Big data pipelines with Apache Beam in any language and run it via Spark, Flink, GCP (Google Cloud Dataflow).
Language: Jupyter Notebook - Size: 321 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
tara-nguyen/modern-data-architecture
Follow along with materials in the book "Modern Data Architectures with Python: A practical guide to building and deploying data pipelines, data warehouses and data lakes" (Lipp, 2023)
Language: Jupyter Notebook - Size: 33.2 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
AnthonyByansi/Rust-Exploratorium
๐ Master Rust programming with this comprehensive roadmap! Explore fundamental and advanced concepts, code examples, and resources.
Language: Rust - Size: 38.1 KB - Last synced: 3 months ago - Pushed: 7 months ago - Stars: 8 - Forks: 0
larribas/dagger
Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).
Language: Python - Size: 9.99 MB - Last synced: 5 days ago - Pushed: about 2 months ago - Stars: 13 - Forks: 5
tuva-health/medicare_cclf_connector
This connector is a dbt project that maps Medicare CCLF claims data to the Tuva Input Layer.
Size: 1010 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 12 - Forks: 12
vanderschaarlab/temporai-mivdp
TemporAI-MIVDP: Adaptation of MIMIC-IV-Data-Pipeline for TemporAI
Language: Python - Size: 1.85 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 1 - Forks: 0
anna-geller/kestra-ci-cd
CI/CD repository template to automate deployments of your production flows
Language: HCL - Size: 96.7 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 4 - Forks: 2
minyansh7/DisasterResponseProject
Build a web application to classify big data of messages into 36 categories that sent to related disaster relief agencies, and help disaster workers to classify new messages.
Language: Jupyter Notebook - Size: 37.8 MB - Last synced: 6 months ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0
opendatadiscovery/odd-collector-gcp ๐ฆ
Open-source GCP metadata collector based on ODD Specification
Language: Python - Size: 188 KB - Last synced: 4 months ago - Pushed: 8 months ago - Stars: 4 - Forks: 0
rcorrero/light-pipe
A high-level syntax for data pipelines, designed to make pipeline development quick and painless.
Language: Python - Size: 1.5 MB - Last synced: 13 days ago - Pushed: 11 months ago - Stars: 3 - Forks: 1
StrictlySkyler/harbormaster Fork of luzlab/harbormaster-apache ๐ฆ
A framework for microservices
Language: JavaScript - Size: 1.82 MB - Last synced: 26 days ago - Pushed: 6 months ago - Stars: 3 - Forks: 5
electronick1/stepist
Framework for data processing
Language: Python - Size: 865 KB - Last synced: 6 days ago - Pushed: over 4 years ago - Stars: 27 - Forks: 5
Elkinmt19/airflow-master
This a repo that was created to learn more about Airflow and develop awesome data engineering projects. ๐๐
Language: Python - Size: 3.33 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 4 - Forks: 3
tuva-health/medicare_lds_connector
Maps Medicare LDS claims data to the Tuva Input Layer so you can easily run the Tuva Project.
Size: 664 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 7 - Forks: 4
shravan-kuchkula/udacity-data-eng-proj-1
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
Language: Python - Size: 3.47 MB - Last synced: 7 months ago - Pushed: over 2 years ago - Stars: 88 - Forks: 58
projectmesadata/cropyield
Creates a data pipeline from the Famine Land Data Assimilation DataSet (FLDAS) to seed model terrain and assess the potential crop yield for a variety of crops.
Language: Jupyter Notebook - Size: 134 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 3 - Forks: 3