Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-lineage

tuva-health/tuva

Main repo including core data model, data marts, reference data, terminology, and the clinical concept library

Size: 22.7 MB - Last synced: about 7 hours ago - Pushed: about 8 hours ago - Stars: 154 - Forks: 30

sergiomoraes/sergiomoraesblog

On this site I share personal thoughts about data, data governance, data quality, metadata, and side projects.

Language: Jupyter Notebook - Size: 87.1 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0

reata/sqllineage

SQL Lineage Analysis Tool powered by Python

Language: Python - Size: 9.11 MB - Last synced: 2 days ago - Pushed: 13 days ago - Stars: 1,145 - Forks: 209

elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

Language: HTML - Size: 192 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1,725 - Forks: 144

brunocampos01/pyssas 📦

Build and deploy automated to SQL Server Analysis Services (SSAS) with Python.

Language: Python - Size: 315 KB - Last synced: 11 days ago - Pushed: over 2 years ago - Stars: 9 - Forks: 2

open-metadata/OpenMetadata

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

Language: TypeScript - Size: 1.3 GB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 4,168 - Forks: 837

grai-io/grai-core

Language: Python - Size: 119 MB - Last synced: about 11 hours ago - Pushed: about 12 hours ago - Stars: 270 - Forks: 20

maropu/spark-sql-flow-plugin

Visualize column-level data lineage in Spark SQL

Language: Scala - Size: 705 MB - Last synced: 11 days ago - Pushed: about 2 years ago - Stars: 80 - Forks: 15

data-drift/data-drift

Metrics Observability & Troubleshooting

Language: HTML - Size: 11.7 MB - Last synced: 19 days ago - Pushed: 3 months ago - Stars: 299 - Forks: 11

tuva-health/tuva_demo

A starter dbt project and synthetic claims dataset for trying out the Tuva Project.

Size: 1.98 MB - Last synced: 17 days ago - Pushed: 18 days ago - Stars: 12 - Forks: 6

MarquezProject/marquez

Collect, aggregate, and visualize a data ecosystem's metadata

Language: Java - Size: 44.8 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1,613 - Forks: 287

badoo/exasol-data-lineage

Exasol data lineage scripts

Language: Python - Size: 22.5 KB - Last synced: 28 days ago - Pushed: almost 3 years ago - Stars: 6 - Forks: 3

elementary-data/dbt-data-reliability

dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

Language: Python - Size: 7.47 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 338 - Forks: 76

GitDataAI/jiaozifs

An Git-like version control file system for data lineage & data collaboration.

Language: Go - Size: 1.66 MB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 41 - Forks: 2

opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Language: Java - Size: 28.1 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1,104 - Forks: 91

vmware/versatile-data-kit

One framework to develop, deploy and operate data workflows with Python and SQL.

Language: Python - Size: 109 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 409 - Forks: 54

finos/waltz

Enterprise Information Service

Language: Java - Size: 55.8 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 163 - Forks: 126

google/grizzly

End-to-end DataOps platform deployed by Terraform.

Language: Python - Size: 112 MB - Last synced: 9 days ago - Pushed: 16 days ago - Stars: 56 - Forks: 10

slidoapp/dbt-superset-lineage

Make dbt docs and Apache Superset talk to one another

Language: Python - Size: 1.84 MB - Last synced: 4 days ago - Pushed: 30 days ago - Stars: 128 - Forks: 14

tokern/data-lineage

Generate and Visualize Data Lineage from query history

Language: Python - Size: 2.46 MB - Last synced: about 1 month ago - Pushed: 10 months ago - Stars: 295 - Forks: 41

Tinkoff/data-detective 📦

Data catalog for everything in your company

Language: Python - Size: 8.99 MB - Last synced: 3 months ago - Pushed: 12 months ago - Stars: 45 - Forks: 13

IBM/multi-data-lineage-capture-py

IBM Multi-Lineage Data System

Language: Python - Size: 237 KB - Last synced: 30 days ago - Pushed: about 1 year ago - Stars: 6 - Forks: 7

GoogleCloudPlatform/bigquery-data-lineage

Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.

Language: Java - Size: 405 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 132 - Forks: 37

tosh2230/stairlight

A data lineage tool detects table dependencies from rendered SQL statements.

Language: Python - Size: 2.37 MB - Last synced: 2 days ago - Pushed: about 1 month ago - Stars: 26 - Forks: 1

tuva-health/medicare_cclf_connector

This connector is a dbt project that maps Medicare CCLF claims data to the Tuva Input Layer.

Size: 1010 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 12 - Forks: 12

pi2schema/pi2schema

Describe your Data Protection rules and Personal Identifying Information as part of your schema

Language: Java - Size: 528 KB - Last synced: 11 days ago - Pushed: 17 days ago - Stars: 9 - Forks: 2

tuva-health/medicare_lds_connector

Maps Medicare LDS claims data to the Tuva Input Layer so you can easily run the Tuva Project.

Size: 664 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 7 - Forks: 4

ahussein/ckanext-datalineage

A CKAN extension to allow providing and visualization of data lineage

Language: JavaScript - Size: 6.08 MB - Last synced: 10 months ago - Pushed: over 6 years ago - Stars: 0 - Forks: 0

tuva-health/provider

A dbt project that transforms messy public provider datasets into usable data for the Tuva Project.

Size: 15.6 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 1 - Forks: 1

aws-samples/document-processing-pipeline-for-regulated-industries

A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.

Language: Python - Size: 11.4 MB - Last synced: 12 months ago - Pushed: over 2 years ago - Stars: 50 - Forks: 12

thestyleofme/data-lineage-parent

数据血缘,Hive/Sqoop/HBase/Spark等,发送到kafka后,解析处理使用neo4j生成血缘

Language: Java - Size: 277 KB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 61 - Forks: 36

datascalehq/datascale

We help data teams ensure the quality of their SQL code and establish the traceability of their data.

Size: 1.95 KB - Last synced: 4 months ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0

GuinsooLab/darkseal

A Single place to Discover, Collaborate, and Get your data right

Language: TypeScript - Size: 272 MB - Last synced: 4 months ago - Pushed: about 1 year ago - Stars: 14 - Forks: 6

tosh2230/stairlight-app

A web application rendering table dependency graph with tosh2230/stairlight, using Graphviz, Streamlit and Google Cloud Run.

Language: Python - Size: 800 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 5 - Forks: 0

StatCan/pachyderm 📦

Data Lineage with End-to-End Pipelines on Kubernetes

Size: 3.91 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

miotech/kun-scheduler

A workflow scheduler understands both your data and metadata.

Language: Java - Size: 63.8 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 25 - Forks: 5

tomaztk/SQLServer-Data-Lineage

Data Lineage for Microsoft SQL Server, Azure SQL Server and Azure Synapse

Language: TSQL - Size: 86.9 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 11 - Forks: 6

metastore-developers/metastore

Metastore Python SDK. Feature store and data catalog for machine learning.

Language: Python - Size: 302 KB - Last synced: 17 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

michaelosthege/gittrail

Context manager for enforcing links between data pipeline outputs and git history.

Language: Python - Size: 41 KB - Last synced: 6 days ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0

AbdullahMu/Data-Pipelines-with-Airflow

Schedule, automate, and monitor data pipelines using Apache Airflow. Run data quality checks, track data lineage, and work with data pipelines in production.

Language: Python - Size: 52.7 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 1 - Forks: 1

bballamudi/multi-data-lineage-capture-py Fork of IBM/multi-data-lineage-capture-py

IBM Multi-Lineage Data System

Size: 83 KB - Last synced: 11 months ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

dmartinpro/spuristo

Data Lineage

Language: Java - Size: 130 KB - Last synced: about 1 year ago - Pushed: about 6 years ago - Stars: 0 - Forks: 0