An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-lineage

montara-io/dbt-command-center

Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.

Language: TypeScript - Size: 3.55 MB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 28 - Forks: 0

GitDataAI/jzfs

A Git-like Version Control File System for AI & Data Product Management.

Language: Rust - Size: 3.09 MB - Last synced at: about 19 hours ago - Pushed at: about 20 hours ago - Stars: 105 - Forks: 11

open-metadata/OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

Language: TypeScript - Size: 1.74 GB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 6,484 - Forks: 1,202

laminlabs/lamindb

A data framework for biology.

Language: Python - Size: 7.57 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 159 - Forks: 15

finos/waltz

Enterprise Information Service

Language: Java - Size: 56.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 195 - Forks: 129

elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

Language: HTML - Size: 205 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2,047 - Forks: 182

elementary-data/dbt-data-reliability

dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

Language: Python - Size: 7.73 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 430 - Forks: 103

sergiomoraes/sergiomoraesblog

On this site I share personal thoughts about data, data governance, data quality, metadata, and side projects.

Language: Jupyter Notebook - Size: 87.1 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

tuva-health/tuva

Main repo including core data model, data marts, data quality tests, and terminology sets.

Language: Shell - Size: 41.4 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 247 - Forks: 79

opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Language: Java - Size: 27.9 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 1,306 - Forks: 122

GoogleCloudPlatform/bigquery-data-lineage 📦

Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.

Language: Java - Size: 356 KB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 143 - Forks: 41

MarquezProject/marquez

Collect, aggregate, and visualize a data ecosystem's metadata

Language: Java - Size: 51.4 MB - Last synced at: 12 days ago - Pushed at: 20 days ago - Stars: 1,890 - Forks: 340

maropu/spark-sql-flow-plugin

Visualize column-level data lineage in Spark SQL

Language: Scala - Size: 705 MB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 91 - Forks: 17

grai-io/grai-core

Language: Python - Size: 121 MB - Last synced at: 7 days ago - Pushed at: 19 days ago - Stars: 303 - Forks: 21

tuva-health/medicare_cclf_connector

This connector is a dbt project that maps Medicare CCLF claims data to the Tuva Input Layer.

Size: 1.02 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 13 - Forks: 16

Seigr-lab/Seigr-EcoSystem

Inspired by biomimicry, Seigr is a Symbiotic Environment of Interconnected Generative Records. This decentralized network enables secure data capsules, adaptive encoding, and dynamic traceability across decentralized storage layers, including IPFS. Anchored by the .seigr protocol, it fosters resilient, modular, and sustainable data management.

Language: Python - Size: 3.49 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

pi2schema/pi2schema

Describe your Data Protection rules and Personal Identifying Information as part of your schema

Language: Java - Size: 546 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 10 - Forks: 2

slidoapp/dbt-superset-lineage

Make dbt docs and Apache Superset talk to one another

Language: Python - Size: 1.85 MB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 142 - Forks: 19

tokern/data-lineage

Generate and Visualize Data Lineage from query history

Language: Python - Size: 2.46 MB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 46

google/grizzly

End-to-end DataOps platform deployed by Terraform.

Language: Python - Size: 113 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 66 - Forks: 10

tuva-health/demo

A starter dbt project and synthetic claims dataset for trying out the Tuva Project.

Size: 3.53 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 25

badoo/exasol-data-lineage

Exasol data lineage scripts

Language: Python - Size: 22.5 KB - Last synced at: 15 days ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 3

tuva-health/provider

A dbt project that transforms messy public provider datasets into usable data for the Tuva Project.

Size: 26.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 4

tomaztk/SQLServer-Data-Lineage

Data Lineage for Microsoft SQL Server, Azure SQL Server and Azure Synapse

Language: TSQL - Size: 86.9 KB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 8

TraceSQL/tracesql-py

Python client for TraceSQL lineage analyzer

Language: Python - Size: 67.4 KB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

JBris/marquez-test

Testing a Docker deployment of Marquez and OpenLineage

Language: Shell - Size: 19.5 KB - Last synced at: 22 days ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

JBris/openmetadata-test

Testing a Docker deployment of OpenMetadata for S3 data ingestion

Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

srijan-singh/neo4j-lineage

Holistic approach for understanding Neo4j and Data Lineage

Language: Java - Size: 60.5 KB - Last synced at: 23 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

tuva-health/medicare_lds_connector

Maps Medicare LDS claims data to the Tuva Input Layer so you can easily run the Tuva Project.

Size: 688 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 10 - Forks: 5

reata/sqllineage

SQL Lineage Analysis Tool powered by Python

Language: Python - Size: 10.1 MB - Last synced at: 6 months ago - Pushed at: 7 months ago - Stars: 1,322 - Forks: 240

data-drift/data-drift

Metrics Observability & Troubleshooting

Language: HTML - Size: 11.7 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 320 - Forks: 11

tosh2230/stairlight

A data lineage tool detects table dependencies from rendered SQL statements.

Language: Python - Size: 2.42 MB - Last synced at: 7 months ago - Pushed at: 8 months ago - Stars: 27 - Forks: 1

tosh2230/stairlight-app

A web application rendering table dependency graph with tosh2230/stairlight, using Graphviz, Streamlit and Google Cloud Run.

Language: Python - Size: 1.05 MB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 5 - Forks: 0

vmware/versatile-data-kit

One framework to develop, deploy and operate data workflows with Python and SQL.

Language: Python - Size: 109 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 409 - Forks: 54

SuperThinking/Data_Lineage_Visualization-Application

Helps you to visualize data lineage. Highly customizable.

Language: JavaScript - Size: 557 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Tinkoff/data-detective 📦

Data catalog for everything in your company

Language: Python - Size: 8.99 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 45 - Forks: 13

IBM/multi-data-lineage-capture-py

IBM Multi-Lineage Data System

Language: Python - Size: 246 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 6 - Forks: 8

ahussein/ckanext-datalineage

A CKAN extension to allow providing and visualization of data lineage

Language: JavaScript - Size: 6.08 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

aws-samples/document-processing-pipeline-for-regulated-industries

A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.

Language: Python - Size: 11.4 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 50 - Forks: 12

thestyleofme/data-lineage-parent

数据血缘,Hive/Sqoop/HBase/Spark等,发送到kafka后,解析处理使用neo4j生成血缘

Language: Java - Size: 277 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 61 - Forks: 36

brunocampos01/pyssas 📦

Build and deploy automated to SQL Server Analysis Services (SSAS) with Python.

Language: Python - Size: 315 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 2

GuinsooLab/darkseal

A Single place to Discover, Collaborate, and Get your data right

Language: TypeScript - Size: 272 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 6

StatCan/pachyderm 📦

Data Lineage with End-to-End Pipelines on Kubernetes

Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

miotech/kun-scheduler

A workflow scheduler understands both your data and metadata.

Language: Java - Size: 63.8 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 25 - Forks: 5

metastore-developers/metastore

Metastore Python SDK. Feature store and data catalog for machine learning.

Language: Python - Size: 302 KB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

michaelosthege/gittrail

Context manager for enforcing links between data pipeline outputs and git history.

Language: Python - Size: 41 KB - Last synced at: about 11 hours ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

AbdullahMu/Data-Pipelines-with-Airflow

Schedule, automate, and monitor data pipelines using Apache Airflow. Run data quality checks, track data lineage, and work with data pipelines in production.

Language: Python - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

bballamudi/multi-data-lineage-capture-py Fork of IBM/multi-data-lineage-capture-py

IBM Multi-Lineage Data System

Size: 83 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

dmartinpro/spuristo

Data Lineage

Language: Java - Size: 130 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0