Topic: "data-pipelines"
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Language: Python - Size: 353 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 39,617 - Forks: 14,888

pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Language: Python - Size: 132 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 24,499 - Forks: 356

apache/dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Language: Java - Size: 209 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 13,412 - Forks: 4,752

dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
Language: Python - Size: 1.26 GB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 12,944 - Forks: 1,649

Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Language: HTML - Size: 192 MB - Last synced at: 3 days ago - Pushed at: 14 days ago - Stars: 10,915 - Forks: 907

mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Language: Python - Size: 233 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 8,240 - Forks: 831

infinyon/fluvio
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Language: Rust - Size: 34.2 MB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 4,606 - Forks: 512

orchest/orchest
Build data pipelines, the easy way 🛠️
Language: TypeScript - Size: 27.2 MB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 4,116 - Forks: 261

StructuredLabs/preswald
Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turning Python scripts into powerful shareable apps.
Language: Python - Size: 79.8 MB - Last synced at: about 7 hours ago - Pushed at: about 7 hours ago - Stars: 3,170 - Forks: 629

elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Language: HTML - Size: 205 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,047 - Forks: 182

meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Language: Python - Size: 140 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2,029 - Forks: 175

ucbepic/docetl
A system for agentic LLM-powered data processing and ETL
Language: Python - Size: 51.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,763 - Forks: 170

data-engineering-community/data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data engineering community.
Language: CSS - Size: 7.83 MB - Last synced at: 10 days ago - Pushed at: 15 days ago - Stars: 1,632 - Forks: 183

combust/mleap
MLeap: Deploy ML Pipelines to Production
Language: Scala - Size: 3.33 MB - Last synced at: 6 months ago - Pushed at: 10 months ago - Stars: 1,503 - Forks: 312

opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Language: Java - Size: 27.9 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 1,306 - Forks: 122

feldera/feldera
The Feldera Incremental Computation Engine
Language: Rust - Size: 167 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,291 - Forks: 58

fmind/mlops-python-package
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Language: Jupyter Notebook - Size: 3.22 MB - Last synced at: 8 days ago - Pushed at: 11 days ago - Stars: 1,225 - Forks: 187

pyper-dev/pyper
Concurrent Python made simple
Language: Python - Size: 462 KB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 1,185 - Forks: 24

yobix-ai/extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Language: Rust - Size: 2.88 MB - Last synced at: 10 days ago - Pushed at: 4 months ago - Stars: 1,051 - Forks: 43

bruin-data/bruin
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Language: Go - Size: 68.8 MB - Last synced at: about 7 hours ago - Pushed at: about 7 hours ago - Stars: 917 - Forks: 34

dataform-co/dataform
Dataform is a framework for managing SQL based data operations in BigQuery
Language: TypeScript - Size: 16.5 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 898 - Forks: 176

raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Language: Go - Size: 12.2 MB - Last synced at: 14 days ago - Pushed at: 11 months ago - Stars: 748 - Forks: 155

artie-labs/transfer
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Language: Go - Size: 3.85 MB - Last synced at: about 9 hours ago - Pushed at: about 9 hours ago - Stars: 648 - Forks: 32

elementary-data/dbt-data-reliability
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Language: Python - Size: 7.73 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 430 - Forks: 103

vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
Language: Python - Size: 109 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 409 - Forks: 54

gabledata/recap
Work with your web service, database, and streaming schemas in a single format.
Language: Python - Size: 1.42 MB - Last synced at: 4 days ago - Pushed at: 18 days ago - Stars: 343 - Forks: 26

tuva-health/tuva
Main repo including core data model, data marts, data quality tests, and terminology sets.
Language: Shell - Size: 43.7 MB - Last synced at: about 12 hours ago - Pushed at: about 13 hours ago - Stars: 249 - Forks: 83

dataflint/spark
Performance Observability for Apache Spark
Language: TypeScript - Size: 18.8 MB - Last synced at: 9 days ago - Pushed at: 16 days ago - Stars: 246 - Forks: 25

dataplane-app/dataplane
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
Language: JavaScript - Size: 281 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 227 - Forks: 33

terrytangyuan/awesome-kubeflow
A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)
Size: 164 KB - Last synced at: 11 days ago - Pushed at: 23 days ago - Stars: 207 - Forks: 17

kevin-hanselman/dud
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
Language: Go - Size: 3.42 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 206 - Forks: 8

datajoint/datajoint-python
Relational data pipelines for the science lab
Language: Python - Size: 19.1 MB - Last synced at: 7 days ago - Pushed at: 20 days ago - Stars: 175 - Forks: 86

koolreport/core
An Open Source PHP Reporting Framework that helps you to write perfect data reports or to construct awesome dashboards in PHP. Working great with all PHP versions from 5.6 to latest 8.0. Fully compatible with all kinds of MVC frameworks like Laravel, CodeIgniter, Symfony.
Language: PHP - Size: 2.61 MB - Last synced at: 8 days ago - Pushed at: 20 days ago - Stars: 168 - Forks: 34

realize-engineering/pipebird
Pipebird is open source infrastructure for securely sharing data with customers.
Language: TypeScript - Size: 1.91 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 168 - Forks: 7

GoogleCloudPlatform/public-datasets-pipelines
Cloud-native, data onboarding architecture for Google Cloud Datasets
Language: Python - Size: 6.64 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 160 - Forks: 70

smart-data-lake/smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Language: Scala - Size: 42.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 121 - Forks: 22

linkedin/Hoptimator
Multi-hop declarative data pipelines
Language: Java - Size: 798 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 112 - Forks: 11

patterns-app/patterns-devkit
Data pipelines from re-usable components
Language: Python - Size: 1.75 MB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 108 - Forks: 5

mycelial/mycelial
Move your data with ease.
Language: Rust - Size: 2.21 MB - Last synced at: 10 days ago - Pushed at: 7 months ago - Stars: 106 - Forks: 9

mitdbg/palimpzest
A System for (Optimized) Semantic Computation
Language: Python - Size: 361 MB - Last synced at: 3 days ago - Pushed at: 13 days ago - Stars: 97 - Forks: 19

DidactHQ/didact
The open core .NET job orchestrator that we've been missing
Size: 28.3 KB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 94 - Forks: 0

shravan-kuchkula/udacity-data-eng-proj-1
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
Language: Python - Size: 3.47 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 88 - Forks: 58

beneath-hq/beneath 📦
Beneath is a serverless real-time data platform ⚡️
Language: Go - Size: 11 MB - Last synced at: 3 days ago - Pushed at: about 3 years ago - Stars: 84 - Forks: 10

DataCater/datacater 📦
The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.
Language: JavaScript - Size: 4.08 MB - Last synced at: 10 months ago - Pushed at: over 1 year ago - Stars: 81 - Forks: 3

DidactHQ/didact-engine
The REST API and execution engine for the Didact Platform.
Language: C# - Size: 261 KB - Last synced at: 3 days ago - Pushed at: 24 days ago - Stars: 72 - Forks: 3

conductor-sdk/conductor-python
Conductor OSS SDK for Python programming language
Language: Python - Size: 1.34 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 71 - Forks: 34

minhadona/data_engineer_interview_challenges
Found a data engineering challenge or participated in a selection process ? Share with us!
Language: Python - Size: 7.35 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 65 - Forks: 12

immu0001/Udacity-Data-Engineer-nanodegree
Classwork projects and home works done through Udacity data engineering nano degree
Language: Jupyter Notebook - Size: 101 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 71

DrDroidLab/kenobi
Easiest way to monitor asynchronous data pipelines
Language: Python - Size: 2.47 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 55 - Forks: 4

KentHsu/Udacity-Data-Engineering-Nanodgree
Udacity Data Engineering Nanodegree Program
Language: Jupyter Notebook - Size: 2.12 MB - Last synced at: 12 days ago - Pushed at: about 4 years ago - Stars: 52 - Forks: 59

siyul-park/uniflow
A high-performance, extremely flexible, and easily extensible universal workflow engine.
Language: Go - Size: 2.66 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 51 - Forks: 5

iesahin/xvc
A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)
Language: Rust - Size: 6.5 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 51 - Forks: 1

CogStack/CogStack-NiFi
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Language: Python - Size: 74.4 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 47 - Forks: 20

bakdata/streams-explorer
Explore Apache Kafka data pipelines in Kubernetes.
Language: Python - Size: 3.63 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 45 - Forks: 5

DanilBaibak/ml-in-production
The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Language: Python - Size: 143 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 43 - Forks: 19

flipkart-incubator/spark-transformers
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Language: Java - Size: 609 KB - Last synced at: 13 days ago - Pushed at: over 7 years ago - Stars: 40 - Forks: 29

Galileo-Galilei/kedro-pandera
A kedro plugin to use pandera in your kedro projects
Language: Python - Size: 208 KB - Last synced at: 15 days ago - Pushed at: 6 months ago - Stars: 35 - Forks: 4

mdh266/AirflowDataPipeline
Example of an ETL Pipeline using Airflow
Language: Python - Size: 14.6 KB - Last synced at: 16 days ago - Pushed at: over 7 years ago - Stars: 34 - Forks: 21

Tanguy9862/Space-App
A Dash application visualizing humanity's journey into space with data from over 7,000 launches and key milestones, from Sputnik to Mars rovers. Built on scalable data pipelines and deployed on GCP, the app offers real-time updates and interactive insights into space exploration history.
Language: Python - Size: 802 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 29 - Forks: 7

montara-io/dbt-command-center
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
Language: TypeScript - Size: 3.55 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 28 - Forks: 0

arakat-community/arakat 📦
ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
Language: Python - Size: 31.6 MB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 27 - Forks: 21

electronick1/stepist 📦
Framework for data processing
Language: Python - Size: 865 KB - Last synced at: 8 days ago - Pushed at: over 5 years ago - Stars: 27 - Forks: 5

kestra-io/examples
Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services
Language: HCL - Size: 3.28 MB - Last synced at: about 20 hours ago - Pushed at: 29 days ago - Stars: 25 - Forks: 9

tuva-health/FHIR_inferno
Connector that loads FHIR r4 USCDIv3 JSON data from local file storage into the Tuva common data model in Snowflake.
Language: Python - Size: 174 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 24 - Forks: 8

giacbrd/SmartPipeline
A framework for rapid development of robust data pipelines following a simple design pattern
Language: Python - Size: 393 KB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 22 - Forks: 2

DidactHQ/didact-ui
The web dashboard for the Didact Platform.
Language: C# - Size: 664 KB - Last synced at: 3 days ago - Pushed at: 26 days ago - Stars: 21 - Forks: 1

pachyderm/neon-workshop
A Pachyderm deep learning tutorial for conference workshops
Language: Python - Size: 56.6 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 19 - Forks: 6

tuva-health/demo
A starter dbt project and synthetic claims dataset for trying out the Tuva Project.
Size: 3.53 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 25

RiveryIO/rivery_cli
Rivery CLI
Language: Python - Size: 625 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 2

confluentinc/learn-kafka-courses
Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.
Language: Shell - Size: 41 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 17 - Forks: 77

larribas/dagger
Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).
Language: Python - Size: 9.97 MB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 17 - Forks: 7

adilkhash/apache-airflow-intro
Language: Python - Size: 9.77 KB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 17 - Forks: 4

ipeluffo/airflow-on-kubernetes
Source code for guide to run Apache Airflow on Kubernetes
Language: Python - Size: 7.81 KB - Last synced at: 9 days ago - Pushed at: about 5 years ago - Stars: 17 - Forks: 13

marcio-azevedo/fsharp-data-processing-pipeline
Provides an extensible solution for creating Data Processing Pipelines in F#.
Language: F# - Size: 352 KB - Last synced at: 9 days ago - Pushed at: about 7 years ago - Stars: 15 - Forks: 1

tuva-health/medicare_cclf_connector
This connector is a dbt project that maps Medicare CCLF claims data to the Tuva Input Layer.
Size: 1.02 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 13 - Forks: 16

tsdat/tsdat
Framework for standardizing, transforming, and applying quality checks to time series data.
Language: Python - Size: 146 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 13 - Forks: 8

anna-geller/kestra-ci-cd
CI/CD repository template to automate deployments of your production flows
Language: HCL - Size: 104 KB - Last synced at: 19 days ago - Pushed at: 10 months ago - Stars: 12 - Forks: 5

ketgo/marshmallow-pyspark
Marshmallow serializer integration with pyspark
Language: Python - Size: 63.5 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 4

dushyantkhosla/airflow4ds
Using Apache Airflow to author, run and monitor complex data pipelines.
Language: Jupyter Notebook - Size: 22.5 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 12 - Forks: 2

alireza-heidarii/Real-Time-Data-Cleaning-Pipeline-for-Medical-and-Healthcare-Data
A real-time data cleaning pipeline for medical and healthcare data using Apache Spark, SparkNLP, Spark Streaming, and Kafka.
Language: Python - Size: 11.7 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

brunocampos01/data-engineering
Language: Python - Size: 165 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 11 - Forks: 2

apicrafter/datacrafter
NoSQL extract, transform, load (ETL) toolkit with Python
Language: Python - Size: 480 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 11 - Forks: 3

tuva-health/medicare_lds_connector
Maps Medicare LDS claims data to the Tuva Input Layer so you can easily run the Tuva Project.
Size: 688 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 10 - Forks: 5

MattTriano/analytics_data_where_house
An analytics engineering sandbox focusing on real estates prices in Cook County, IL
Language: Python - Size: 17.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 0

TextQLLabs/dbt-documentor
✍️ dbt doc generator for advanced data teams
Language: F# - Size: 187 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 1

kiwicom/terraform-provider-montecarlo
This open-source Terraform provider enables users to seamlessly integrate the Monte Carlo data reliabillity platform into their infrastructure as a code (IaC) workflows.
Language: Go - Size: 249 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 2

mackelab/epiphyte
Python toolkit for working with high-dimensional neural data recorded during naturalistic, continuous stimuli @a-darcher @rachrapp
Language: Jupyter Notebook - Size: 190 MB - Last synced at: about 16 hours ago - Pushed at: 7 months ago - Stars: 8 - Forks: 1

AnthonyByansi/Airflow-Data-Pipeline-Automation
Automate your data pipelines using Apache Airflow with this ready-to-use DAG for data integration, ETL and workflow automation.
Size: 60 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 8 - Forks: 0

AnthonyByansi/Rust-Exploratorium
🚀 Master Rust programming with this comprehensive roadmap! Explore fundamental and advanced concepts, code examples, and resources.
Language: Rust - Size: 38.1 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

goto/optimus Fork of raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Language: Go - Size: 33.1 MB - Last synced at: about 24 hours ago - Pushed at: 1 day ago - Stars: 7 - Forks: 4

CofluxLabs/coflux
Open-source workflow engine. Orchestrate and observe computational workflows defined in plain Python. Suitable for data pipelines, background tasks, etc.
Language: Elixir - Size: 4.06 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 1

metaheed/kolle
Business model representation automation
Language: Shell - Size: 154 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

aredier/chariots
versioned machine learning pipelines
Language: Python - Size: 653 KB - Last synced at: 2 days ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

simplybusiness/code-first-pipelines
A code-first way to define Ploomber pipelines
Language: Python - Size: 285 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 0

glassflow/cli
GlassFlow CLI to create and manage real-time data pipelines
Language: Shell - Size: 15.6 KB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 6 - Forks: 0

unicef/magasin
Cloud native open-source end-to-end data / AI / ML platform
Language: Mustache - Size: 21.7 MB - Last synced at: about 13 hours ago - Pushed at: 3 months ago - Stars: 5 - Forks: 3

Snehil-Shah/Seismic-Alerts-Streamer
A Realtime Seismic Logging & Alerts Service with Live Monitoring & Email Alerts made using Kafka Data Pipelines, all Dockerized & Deployment Ready!
Language: Java - Size: 12.5 MB - Last synced at: 12 days ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

JennyferWAN/Coursera_IBM_Data_Engineering
IBM Data Engineering - Master SQL, RDBMS, ETL, Data Warehousing, NoSQL, Big Data and Spark with hands-on job-ready skills
Language: Jupyter Notebook - Size: 3.24 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 1

BinariesGoalls/Udacity-Data-Engineering-Nanodegree
This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.
Language: PLpgSQL - Size: 109 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 2

tuva-health/provider
A dbt project that transforms messy public provider datasets into usable data for the Tuva Project.
Size: 26.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 4
