GitHub topics: data-lineage
montara-io/dbt-command-center
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
Language: TypeScript - Size: 3.55 MB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 28 - Forks: 0

GitDataAI/jzfs
A Git-like Version Control File System for AI & Data Product Management.
Language: Rust - Size: 3.09 MB - Last synced at: about 19 hours ago - Pushed at: about 20 hours ago - Stars: 105 - Forks: 11

open-metadata/OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
Language: TypeScript - Size: 1.74 GB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 6,484 - Forks: 1,202

laminlabs/lamindb
A data framework for biology.
Language: Python - Size: 7.57 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 159 - Forks: 15

finos/waltz
Enterprise Information Service
Language: Java - Size: 56.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 195 - Forks: 129

elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Language: HTML - Size: 205 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2,047 - Forks: 182

elementary-data/dbt-data-reliability
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Language: Python - Size: 7.73 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 430 - Forks: 103

sergiomoraes/sergiomoraesblog
On this site I share personal thoughts about data, data governance, data quality, metadata, and side projects.
Language: Jupyter Notebook - Size: 87.1 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

tuva-health/tuva
Main repo including core data model, data marts, data quality tests, and terminology sets.
Language: Shell - Size: 41.4 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 247 - Forks: 79

opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Language: Java - Size: 27.9 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 1,306 - Forks: 122

GoogleCloudPlatform/bigquery-data-lineage 📦
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
Language: Java - Size: 356 KB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 143 - Forks: 41

MarquezProject/marquez
Collect, aggregate, and visualize a data ecosystem's metadata
Language: Java - Size: 51.4 MB - Last synced at: 12 days ago - Pushed at: 20 days ago - Stars: 1,890 - Forks: 340

maropu/spark-sql-flow-plugin
Visualize column-level data lineage in Spark SQL
Language: Scala - Size: 705 MB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 91 - Forks: 17

grai-io/grai-core
Language: Python - Size: 121 MB - Last synced at: 7 days ago - Pushed at: 19 days ago - Stars: 303 - Forks: 21

tuva-health/medicare_cclf_connector
This connector is a dbt project that maps Medicare CCLF claims data to the Tuva Input Layer.
Size: 1.02 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 13 - Forks: 16

Seigr-lab/Seigr-EcoSystem
Inspired by biomimicry, Seigr is a Symbiotic Environment of Interconnected Generative Records. This decentralized network enables secure data capsules, adaptive encoding, and dynamic traceability across decentralized storage layers, including IPFS. Anchored by the .seigr protocol, it fosters resilient, modular, and sustainable data management.
Language: Python - Size: 3.49 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

pi2schema/pi2schema
Describe your Data Protection rules and Personal Identifying Information as part of your schema
Language: Java - Size: 546 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 10 - Forks: 2

slidoapp/dbt-superset-lineage
Make dbt docs and Apache Superset talk to one another
Language: Python - Size: 1.85 MB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 142 - Forks: 19

tokern/data-lineage
Generate and Visualize Data Lineage from query history
Language: Python - Size: 2.46 MB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 46

google/grizzly
End-to-end DataOps platform deployed by Terraform.
Language: Python - Size: 113 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 66 - Forks: 10

tuva-health/demo
A starter dbt project and synthetic claims dataset for trying out the Tuva Project.
Size: 3.53 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 25

badoo/exasol-data-lineage
Exasol data lineage scripts
Language: Python - Size: 22.5 KB - Last synced at: 15 days ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 3

tuva-health/provider
A dbt project that transforms messy public provider datasets into usable data for the Tuva Project.
Size: 26.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 4

tomaztk/SQLServer-Data-Lineage
Data Lineage for Microsoft SQL Server, Azure SQL Server and Azure Synapse
Language: TSQL - Size: 86.9 KB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 8

TraceSQL/tracesql-py
Python client for TraceSQL lineage analyzer
Language: Python - Size: 67.4 KB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

JBris/marquez-test
Testing a Docker deployment of Marquez and OpenLineage
Language: Shell - Size: 19.5 KB - Last synced at: 22 days ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

JBris/openmetadata-test
Testing a Docker deployment of OpenMetadata for S3 data ingestion
Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

srijan-singh/neo4j-lineage
Holistic approach for understanding Neo4j and Data Lineage
Language: Java - Size: 60.5 KB - Last synced at: 23 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

tuva-health/medicare_lds_connector
Maps Medicare LDS claims data to the Tuva Input Layer so you can easily run the Tuva Project.
Size: 688 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 10 - Forks: 5

reata/sqllineage
SQL Lineage Analysis Tool powered by Python
Language: Python - Size: 10.1 MB - Last synced at: 6 months ago - Pushed at: 7 months ago - Stars: 1,322 - Forks: 240

data-drift/data-drift
Metrics Observability & Troubleshooting
Language: HTML - Size: 11.7 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 320 - Forks: 11

tosh2230/stairlight
A data lineage tool detects table dependencies from rendered SQL statements.
Language: Python - Size: 2.42 MB - Last synced at: 7 months ago - Pushed at: 8 months ago - Stars: 27 - Forks: 1

tosh2230/stairlight-app
A web application rendering table dependency graph with tosh2230/stairlight, using Graphviz, Streamlit and Google Cloud Run.
Language: Python - Size: 1.05 MB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 5 - Forks: 0

vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
Language: Python - Size: 109 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 409 - Forks: 54

SuperThinking/Data_Lineage_Visualization-Application
Helps you to visualize data lineage. Highly customizable.
Language: JavaScript - Size: 557 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Tinkoff/data-detective 📦
Data catalog for everything in your company
Language: Python - Size: 8.99 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 45 - Forks: 13

IBM/multi-data-lineage-capture-py
IBM Multi-Lineage Data System
Language: Python - Size: 246 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 6 - Forks: 8

ahussein/ckanext-datalineage
A CKAN extension to allow providing and visualization of data lineage
Language: JavaScript - Size: 6.08 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

aws-samples/document-processing-pipeline-for-regulated-industries
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
Language: Python - Size: 11.4 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 50 - Forks: 12

thestyleofme/data-lineage-parent
数据血缘,Hive/Sqoop/HBase/Spark等,发送到kafka后,解析处理使用neo4j生成血缘
Language: Java - Size: 277 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 61 - Forks: 36

brunocampos01/pyssas 📦
Build and deploy automated to SQL Server Analysis Services (SSAS) with Python.
Language: Python - Size: 315 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 2

GuinsooLab/darkseal
A Single place to Discover, Collaborate, and Get your data right
Language: TypeScript - Size: 272 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 14 - Forks: 6

StatCan/pachyderm 📦
Data Lineage with End-to-End Pipelines on Kubernetes
Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

miotech/kun-scheduler
A workflow scheduler understands both your data and metadata.
Language: Java - Size: 63.8 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 25 - Forks: 5

metastore-developers/metastore
Metastore Python SDK. Feature store and data catalog for machine learning.
Language: Python - Size: 302 KB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

michaelosthege/gittrail
Context manager for enforcing links between data pipeline outputs and git history.
Language: Python - Size: 41 KB - Last synced at: about 11 hours ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

AbdullahMu/Data-Pipelines-with-Airflow
Schedule, automate, and monitor data pipelines using Apache Airflow. Run data quality checks, track data lineage, and work with data pipelines in production.
Language: Python - Size: 52.7 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

bballamudi/multi-data-lineage-capture-py Fork of IBM/multi-data-lineage-capture-py
IBM Multi-Lineage Data System
Size: 83 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

dmartinpro/spuristo
Data Lineage
Language: Java - Size: 130 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0
