An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-pipelines"

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Language: Python - Size: 353 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 39,617 - Forks: 14,888

pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

Language: Python - Size: 132 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 24,499 - Forks: 356

apache/dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

Language: Java - Size: 209 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 13,412 - Forks: 4,752

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

Language: Python - Size: 1.26 GB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 12,944 - Forks: 1,649

Unstructured-IO/unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Language: HTML - Size: 192 MB - Last synced at: 3 days ago - Pushed at: 14 days ago - Stars: 10,915 - Forks: 907

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

Language: Python - Size: 233 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 8,240 - Forks: 831

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

Language: Rust - Size: 34.2 MB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 4,606 - Forks: 512

orchest/orchest

Build data pipelines, the easy way 🛠️

Language: TypeScript - Size: 27.2 MB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 4,116 - Forks: 261

StructuredLabs/preswald

Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turning Python scripts into powerful shareable apps.

Language: Python - Size: 79.8 MB - Last synced at: about 7 hours ago - Pushed at: about 7 hours ago - Stars: 3,170 - Forks: 629

elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

Language: HTML - Size: 205 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,047 - Forks: 182

meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Language: Python - Size: 140 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2,029 - Forks: 175

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

Language: Python - Size: 51.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,763 - Forks: 170

data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

Language: CSS - Size: 7.83 MB - Last synced at: 10 days ago - Pushed at: 15 days ago - Stars: 1,632 - Forks: 183

combust/mleap

MLeap: Deploy ML Pipelines to Production

Language: Scala - Size: 3.33 MB - Last synced at: 6 months ago - Pushed at: 10 months ago - Stars: 1,503 - Forks: 312

opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Language: Java - Size: 27.9 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 1,306 - Forks: 122

feldera/feldera

The Feldera Incremental Computation Engine

Language: Rust - Size: 167 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,291 - Forks: 58

fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

Language: Jupyter Notebook - Size: 3.22 MB - Last synced at: 8 days ago - Pushed at: 11 days ago - Stars: 1,225 - Forks: 187

pyper-dev/pyper

Concurrent Python made simple

Language: Python - Size: 462 KB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 1,185 - Forks: 24

yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

Language: Rust - Size: 2.88 MB - Last synced at: 10 days ago - Pushed at: 4 months ago - Stars: 1,051 - Forks: 43

bruin-data/bruin

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

Language: Go - Size: 68.8 MB - Last synced at: about 7 hours ago - Pushed at: about 7 hours ago - Stars: 917 - Forks: 34

dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

Language: TypeScript - Size: 16.5 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 898 - Forks: 176

raystack/optimus

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

Language: Go - Size: 12.2 MB - Last synced at: 14 days ago - Pushed at: 11 months ago - Stars: 748 - Forks: 155

artie-labs/transfer

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.

Language: Go - Size: 3.85 MB - Last synced at: about 9 hours ago - Pushed at: about 9 hours ago - Stars: 648 - Forks: 32

elementary-data/dbt-data-reliability

dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

Language: Python - Size: 7.73 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 430 - Forks: 103

vmware/versatile-data-kit

One framework to develop, deploy and operate data workflows with Python and SQL.

Language: Python - Size: 109 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 409 - Forks: 54

gabledata/recap

Work with your web service, database, and streaming schemas in a single format.

Language: Python - Size: 1.42 MB - Last synced at: 4 days ago - Pushed at: 18 days ago - Stars: 343 - Forks: 26

tuva-health/tuva

Main repo including core data model, data marts, data quality tests, and terminology sets.

Language: Shell - Size: 43.7 MB - Last synced at: about 12 hours ago - Pushed at: about 13 hours ago - Stars: 249 - Forks: 83

dataflint/spark

Performance Observability for Apache Spark

Language: TypeScript - Size: 18.8 MB - Last synced at: 9 days ago - Pushed at: 16 days ago - Stars: 246 - Forks: 25

dataplane-app/dataplane

Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.

Language: JavaScript - Size: 281 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 227 - Forks: 33

terrytangyuan/awesome-kubeflow

A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)

Size: 164 KB - Last synced at: 11 days ago - Pushed at: 23 days ago - Stars: 207 - Forks: 17

kevin-hanselman/dud

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

Language: Go - Size: 3.42 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 206 - Forks: 8

datajoint/datajoint-python

Relational data pipelines for the science lab

Language: Python - Size: 19.1 MB - Last synced at: 7 days ago - Pushed at: 20 days ago - Stars: 175 - Forks: 86

koolreport/core

An Open Source PHP Reporting Framework that helps you to write perfect data reports or to construct awesome dashboards in PHP. Working great with all PHP versions from 5.6 to latest 8.0. Fully compatible with all kinds of MVC frameworks like Laravel, CodeIgniter, Symfony.

Language: PHP - Size: 2.61 MB - Last synced at: 8 days ago - Pushed at: 20 days ago - Stars: 168 - Forks: 34

realize-engineering/pipebird

Pipebird is open source infrastructure for securely sharing data with customers.

Language: TypeScript - Size: 1.91 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 168 - Forks: 7

GoogleCloudPlatform/public-datasets-pipelines

Cloud-native, data onboarding architecture for Google Cloud Datasets

Language: Python - Size: 6.64 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 160 - Forks: 70

smart-data-lake/smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Language: Scala - Size: 42.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 121 - Forks: 22

linkedin/Hoptimator

Multi-hop declarative data pipelines

Language: Java - Size: 798 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 112 - Forks: 11

patterns-app/patterns-devkit

Data pipelines from re-usable components

Language: Python - Size: 1.75 MB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 108 - Forks: 5

mycelial/mycelial

Move your data with ease.

Language: Rust - Size: 2.21 MB - Last synced at: 10 days ago - Pushed at: 7 months ago - Stars: 106 - Forks: 9

mitdbg/palimpzest

A System for (Optimized) Semantic Computation

Language: Python - Size: 361 MB - Last synced at: 3 days ago - Pushed at: 13 days ago - Stars: 97 - Forks: 19

DidactHQ/didact

The open core .NET job orchestrator that we've been missing

Size: 28.3 KB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 94 - Forks: 0

shravan-kuchkula/udacity-data-eng-proj-1

Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3

Language: Python - Size: 3.47 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 88 - Forks: 58

beneath-hq/beneath 📦

Beneath is a serverless real-time data platform ⚡️

Language: Go - Size: 11 MB - Last synced at: 3 days ago - Pushed at: about 3 years ago - Stars: 84 - Forks: 10

DataCater/datacater 📦

The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.

Language: JavaScript - Size: 4.08 MB - Last synced at: 10 months ago - Pushed at: over 1 year ago - Stars: 81 - Forks: 3

DidactHQ/didact-engine

The REST API and execution engine for the Didact Platform.

Language: C# - Size: 261 KB - Last synced at: 3 days ago - Pushed at: 24 days ago - Stars: 72 - Forks: 3

conductor-sdk/conductor-python

Conductor OSS SDK for Python programming language

Language: Python - Size: 1.34 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 71 - Forks: 34

minhadona/data_engineer_interview_challenges

Found a data engineering challenge or participated in a selection process ? Share with us!

Language: Python - Size: 7.35 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 65 - Forks: 12

immu0001/Udacity-Data-Engineer-nanodegree

Classwork projects and home works done through Udacity data engineering nano degree

Language: Jupyter Notebook - Size: 101 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 71

DrDroidLab/kenobi

Easiest way to monitor asynchronous data pipelines

Language: Python - Size: 2.47 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 55 - Forks: 4

KentHsu/Udacity-Data-Engineering-Nanodgree

Udacity Data Engineering Nanodegree Program

Language: Jupyter Notebook - Size: 2.12 MB - Last synced at: 12 days ago - Pushed at: about 4 years ago - Stars: 52 - Forks: 59

siyul-park/uniflow

A high-performance, extremely flexible, and easily extensible universal workflow engine.

Language: Go - Size: 2.66 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 51 - Forks: 5

iesahin/xvc

A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

Language: Rust - Size: 6.5 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 51 - Forks: 1

CogStack/CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

Language: Python - Size: 74.4 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 47 - Forks: 20

bakdata/streams-explorer

Explore Apache Kafka data pipelines in Kubernetes.

Language: Python - Size: 3.63 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 45 - Forks: 5

DanilBaibak/ml-in-production

The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.

Language: Python - Size: 143 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 43 - Forks: 19

flipkart-incubator/spark-transformers

Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.

Language: Java - Size: 609 KB - Last synced at: 13 days ago - Pushed at: over 7 years ago - Stars: 40 - Forks: 29

Galileo-Galilei/kedro-pandera

A kedro plugin to use pandera in your kedro projects

Language: Python - Size: 208 KB - Last synced at: 15 days ago - Pushed at: 6 months ago - Stars: 35 - Forks: 4

mdh266/AirflowDataPipeline

Example of an ETL Pipeline using Airflow

Language: Python - Size: 14.6 KB - Last synced at: 16 days ago - Pushed at: over 7 years ago - Stars: 34 - Forks: 21

Tanguy9862/Space-App

A Dash application visualizing humanity's journey into space with data from over 7,000 launches and key milestones, from Sputnik to Mars rovers. Built on scalable data pipelines and deployed on GCP, the app offers real-time updates and interactive insights into space exploration history.

Language: Python - Size: 802 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 29 - Forks: 7

montara-io/dbt-command-center

Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.

Language: TypeScript - Size: 3.55 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 28 - Forks: 0

arakat-community/arakat 📦

ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform

Language: Python - Size: 31.6 MB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 27 - Forks: 21

electronick1/stepist 📦

Framework for data processing

Language: Python - Size: 865 KB - Last synced at: 8 days ago - Pushed at: over 5 years ago - Stars: 27 - Forks: 5

kestra-io/examples

Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services

Language: HCL - Size: 3.28 MB - Last synced at: about 20 hours ago - Pushed at: 29 days ago - Stars: 25 - Forks: 9

tuva-health/FHIR_inferno

Connector that loads FHIR r4 USCDIv3 JSON data from local file storage into the Tuva common data model in Snowflake.

Language: Python - Size: 174 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 24 - Forks: 8

giacbrd/SmartPipeline

A framework for rapid development of robust data pipelines following a simple design pattern

Language: Python - Size: 393 KB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 22 - Forks: 2

DidactHQ/didact-ui

The web dashboard for the Didact Platform.

Language: C# - Size: 664 KB - Last synced at: 3 days ago - Pushed at: 26 days ago - Stars: 21 - Forks: 1

pachyderm/neon-workshop

A Pachyderm deep learning tutorial for conference workshops

Language: Python - Size: 56.6 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 19 - Forks: 6

tuva-health/demo

A starter dbt project and synthetic claims dataset for trying out the Tuva Project.

Size: 3.53 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 25

RiveryIO/rivery_cli

Rivery CLI

Language: Python - Size: 625 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 2

confluentinc/learn-kafka-courses

Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.

Language: Shell - Size: 41 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 17 - Forks: 77

larribas/dagger

Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).

Language: Python - Size: 9.97 MB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 17 - Forks: 7

adilkhash/apache-airflow-intro

Language: Python - Size: 9.77 KB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 17 - Forks: 4

ipeluffo/airflow-on-kubernetes

Source code for guide to run Apache Airflow on Kubernetes

Language: Python - Size: 7.81 KB - Last synced at: 9 days ago - Pushed at: about 5 years ago - Stars: 17 - Forks: 13

marcio-azevedo/fsharp-data-processing-pipeline

Provides an extensible solution for creating Data Processing Pipelines in F#.

Language: F# - Size: 352 KB - Last synced at: 9 days ago - Pushed at: about 7 years ago - Stars: 15 - Forks: 1

tuva-health/medicare_cclf_connector

This connector is a dbt project that maps Medicare CCLF claims data to the Tuva Input Layer.

Size: 1.02 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 13 - Forks: 16

tsdat/tsdat

Framework for standardizing, transforming, and applying quality checks to time series data.

Language: Python - Size: 146 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 13 - Forks: 8

anna-geller/kestra-ci-cd

CI/CD repository template to automate deployments of your production flows

Language: HCL - Size: 104 KB - Last synced at: 19 days ago - Pushed at: 10 months ago - Stars: 12 - Forks: 5

ketgo/marshmallow-pyspark

Marshmallow serializer integration with pyspark

Language: Python - Size: 63.5 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 4

dushyantkhosla/airflow4ds

Using Apache Airflow to author, run and monitor complex data pipelines.

Language: Jupyter Notebook - Size: 22.5 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 12 - Forks: 2

alireza-heidarii/Real-Time-Data-Cleaning-Pipeline-for-Medical-and-Healthcare-Data

A real-time data cleaning pipeline for medical and healthcare data using Apache Spark, SparkNLP, Spark Streaming, and Kafka.

Language: Python - Size: 11.7 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

brunocampos01/data-engineering

Language: Python - Size: 165 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 11 - Forks: 2

apicrafter/datacrafter

NoSQL extract, transform, load (ETL) toolkit with Python

Language: Python - Size: 480 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 11 - Forks: 3

tuva-health/medicare_lds_connector

Maps Medicare LDS claims data to the Tuva Input Layer so you can easily run the Tuva Project.

Size: 688 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 10 - Forks: 5

MattTriano/analytics_data_where_house

An analytics engineering sandbox focusing on real estates prices in Cook County, IL

Language: Python - Size: 17.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 0

TextQLLabs/dbt-documentor

✍️ dbt doc generator for advanced data teams

Language: F# - Size: 187 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 1

kiwicom/terraform-provider-montecarlo

This open-source Terraform provider enables users to seamlessly integrate the Monte Carlo data reliabillity platform into their infrastructure as a code (IaC) workflows.

Language: Go - Size: 249 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 2

mackelab/epiphyte

Python toolkit for working with high-dimensional neural data recorded during naturalistic, continuous stimuli @a-darcher @rachrapp

Language: Jupyter Notebook - Size: 190 MB - Last synced at: about 16 hours ago - Pushed at: 7 months ago - Stars: 8 - Forks: 1

AnthonyByansi/Airflow-Data-Pipeline-Automation

Automate your data pipelines using Apache Airflow with this ready-to-use DAG for data integration, ETL and workflow automation.

Size: 60 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 8 - Forks: 0

AnthonyByansi/Rust-Exploratorium

🚀 Master Rust programming with this comprehensive roadmap! Explore fundamental and advanced concepts, code examples, and resources.

Language: Rust - Size: 38.1 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

goto/optimus Fork of raystack/optimus

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

Language: Go - Size: 33.1 MB - Last synced at: about 24 hours ago - Pushed at: 1 day ago - Stars: 7 - Forks: 4

CofluxLabs/coflux

Open-source workflow engine. Orchestrate and observe computational workflows defined in plain Python. Suitable for data pipelines, background tasks, etc.

Language: Elixir - Size: 4.06 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 1

metaheed/kolle

Business model representation automation

Language: Shell - Size: 154 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

aredier/chariots

versioned machine learning pipelines

Language: Python - Size: 653 KB - Last synced at: 2 days ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

simplybusiness/code-first-pipelines

A code-first way to define Ploomber pipelines

Language: Python - Size: 285 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 0

glassflow/cli

GlassFlow CLI to create and manage real-time data pipelines

Language: Shell - Size: 15.6 KB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 6 - Forks: 0

unicef/magasin

Cloud native open-source end-to-end data / AI / ML platform

Language: Mustache - Size: 21.7 MB - Last synced at: about 13 hours ago - Pushed at: 3 months ago - Stars: 5 - Forks: 3

Snehil-Shah/Seismic-Alerts-Streamer

A Realtime Seismic Logging & Alerts Service with Live Monitoring & Email Alerts made using Kafka Data Pipelines, all Dockerized & Deployment Ready!

Language: Java - Size: 12.5 MB - Last synced at: 12 days ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

JennyferWAN/Coursera_IBM_Data_Engineering

IBM Data Engineering - Master SQL, RDBMS, ETL, Data Warehousing, NoSQL, Big Data and Spark with hands-on job-ready skills

Language: Jupyter Notebook - Size: 3.24 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 1

BinariesGoalls/Udacity-Data-Engineering-Nanodegree

This is a repository to hold the files and notebooks produced throughout my Udacity's Nanodegree Data Engineering program.

Language: PLpgSQL - Size: 109 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 2

tuva-health/provider

A dbt project that transforms messy public provider datasets into usable data for the Tuva Project.

Size: 26.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 4