An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-quality-monitoring

sodadata/soda-core

:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

Language: Python - Size: 4.13 MB - Last synced at: about 7 hours ago - Pushed at: about 8 hours ago - Stars: 2,125 - Forks: 237

realdatadriven/etlx

This project is an ETL (Extract, Transform, Load) Framework powered by DuckDB, designed to seamlessly integrate and process data from diverse sources. It leverages Markdown as a configuration medium, where YAML blocks define metadata for each data source, and embedded SQL blocks specify the extraction, transformation, and loading logic.

Language: Go - Size: 2.97 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 8 - Forks: 1

datafold/data-diff 📦

Compare tables within or across databases

Language: Python - Size: 3.98 MB - Last synced at: about 6 hours ago - Pushed at: about 1 year ago - Stars: 2,972 - Forks: 288

C-20-s/Data-Risk-Monitoring

This project helps detect and log mismatches in student academic records (entry, dropout, GPA issues) using Python and SQL.

Language: Python - Size: 0 Bytes - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

databrickslabs/dqx

Databricks framework to validate Data Quality of pySpark DataFrames

Language: Python - Size: 2.75 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 278 - Forks: 41

dqops/dqo

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.

Language: Java - Size: 91.1 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 151 - Forks: 29

ms32035/inspector

Source-available data quality tool

Language: Python - Size: 1.81 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

open-metadata/openmetadata-site

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

Language: TypeScript - Size: 54.6 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 14 - Forks: 11

Arize-ai/client_python

A python library to send data to Arize AI!

Language: Python - Size: 51.7 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 55 - Forks: 17

Indexical-Metrics-Measure-Advisory/watchmen

Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, indicator objective analysis and quality management

Language: TypeScript - Size: 20.6 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 7 - Forks: 3

re-data/re-data

re_data - fix data issues before your users & CEO would discover them 😊

Language: HTML - Size: 76.5 MB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 1,560 - Forks: 122

datachecks/dcs-core

Open Source Data Quality Monitoring.

Language: Python - Size: 4.39 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 155 - Forks: 23

sodadata/soda-github-action

:zap: Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.

Language: Python - Size: 47.9 KB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 14 - Forks: 0

Swiple/swiple

Swiple enables you to easily observe, understand, validate and improve the quality of your data

Language: Python - Size: 186 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 84 - Forks: 11

Bilpapster/stream-DaQ

A highly-configurable, real-time data quality monitoring tool designed for streaming data

Language: Python - Size: 32.7 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 8 - Forks: 0

datavane/datavines

Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.

Language: Java - Size: 22.8 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 620 - Forks: 184

DataBridgeTech/dbqctl

DataBridge Quality Control

Language: Go - Size: 122 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

weiser-ai/weiser-ai

Data Quality made simple.

Language: Python - Size: 858 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

ubisoft/mobydq

:whale: Tool to automate data quality checks on data pipelines

Language: Vue - Size: 188 MB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 255 - Forks: 62

FelisPimeja/openstreetmap_water

First try to validate OpenStreetMap watercources quality (Russia only for now)

Language: SQL - Size: 1.79 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

qalita-io/packs

Qalita Public Packs

Language: Python - Size: 1.58 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

ArseniiGav/DINAMO

Dynamic and INterpretable Anomaly MOnitoring for Large-Scale Particle Physics Experiments

Language: Python - Size: 37.1 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

Richardbnk/GCP_Data_Quality

A class designed to facilitate the creation and management of data quality monitoring processes, ensuring efficient and reliable data validation and maintenance.

Language: Python - Size: 22.5 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

meteoswiss-mdr/pyrad

Python Radar Data Processing

Language: Python - Size: 89.9 MB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 55 - Forks: 21

hms-dbmi/EHRtemporalVariability

R package for delineating temporal dataset shifts in Eletronic Health Records

Language: HTML - Size: 11.7 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 17 - Forks: 8

DP6/penguin-datalayer-collect

A data layer quality monitoring and validation module, this solution is part of the Raft Suite ecosystem.

Language: HCL - Size: 2.07 MB - Last synced at: 2 days ago - Pushed at: over 2 years ago - Stars: 21 - Forks: 4

astutic/Acharya

A Data Centric NER annotation tool for your Named Entity Recognition projects

Size: 11.3 MB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 45 - Forks: 3

PeerNova-Solutions/cuneiformsf-reports-datahealth

... Cuneiform for Salesforce reporting library focusing on CRM Data Health. 110+ Data Health reports spanning 17 categories. 100% free to Cuneiform for Salesforce customers.

Size: 13.6 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

Technological-Unemployment/ZenDesk-Ticket-Generator-and-Uploader-with-attachments

Using SQL generate charts by customer name and create tickets and upload charts in ZenDesk.

Language: Python - Size: 35.2 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

flaviaouyang/molly

Monitor the quality of time series data in your SQL database

Language: Python - Size: 502 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ataustin/flyover

Visually compare distributions in data sets

Language: R - Size: 4.95 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

qalita-io/data-quality-platform

Data quality made simple

Size: 1000 Bytes - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

yu-iskw/dbt-artifacts-loader 📦

Load dbt artifacts uploaded to GCS to BigQuery in order to track historical dbt results

Language: Python - Size: 1.12 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 0

Hyhyhyhyhyhyh/Django-Data-quality-system

数据治理、数据质量检核/监控平台(Django+jQuery+MySQL)

Language: Python - Size: 19 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 175 - Forks: 73

Indexical-Metrics-Measure-Advisory/watchmen-matryoshka-doll

Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management

Language: Python - Size: 2.16 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 130 - Forks: 21

datagovs/datagovs

Democratize data analysis and insights for non-SQL users

Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 1

Arize-ai/client_java

Java client to interact with Arize API

Language: Java - Size: 418 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 7 - Forks: 0

baligoyem/dataqtor

🔍Your Data Quality Detector / Gain insight into your data and get it ready for use before you start working with it 💡📊🛠💎

Language: Python - Size: 9.43 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 6

seedatnabeel/Data-SUITE

Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)

Language: Jupyter Notebook - Size: 4.22 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 4

curie-data-factory/health-data-metrics

Health Data Metrics (HDM) a Data Quality assessment Application.

Language: PHP - Size: 4.71 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 1

Pawsanie/PySpark_universal_dq_report

The script reads the dataset along the path and selects the columns in it received from the argument for the specified dates. Then it saves the report to the specified path of HDFS.

Language: Python - Size: 25.4 KB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 2

pilillo/gilberto

Language: Shell - Size: 199 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

lisehr/dq-meerkat

Automated Continuous Data Quality Measurement

Language: TypeScript - Size: 677 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 3

varun-vasudevan/CDRS-India

Dataset curated for evaluating the quality of COVID-19 data (surveillance, vaccination monitoring, bed availability) reporting across India.

Size: 131 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 1

chetnachaudhari/PySpark_Helpers

A library of helpful pyspark functions

Language: Python - Size: 10.7 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

mynttt/dqgui

DQGUI is an IDE written in JavaFX for the IQM4HD DSL (Domain Specific Language)

Language: Java - Size: 6.63 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

bballamudi/mobydq Fork of ubisoft/mobydq

:whale: Tool to automate data quality checks on data pipelines

Size: 188 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

Related Keywords
data-quality-monitoring 47 data-quality 32 data-quality-checks 20 data-engineering 8 dataquality 8 data-observability 8 data-profiling 8 data-science 8 python 7 data-quality-measurement 6 data-quality-assessment 6 data-reliability 5 data-validation 5 data-monitoring 5 dbt 5 data 4 data-visualization 4 monitoring 4 database 4 data-testing 4 spark 4 data-governance 4 data-pipeline 3 machine-learning 3 visualization 3 data-analysis 3 data-quality-monitor 3 data-quality-report 3 snowflake 3 datatesting 3 google-cloud-platform 2 model-monitoring 2 data-ops 2 model-explainability 2 data-quality-testing 2 datascience 2 hacktoberfest 2 pipeline 2 metadata 2 metrics 2 ml-observability 2 ai-monitoring 2 ai-observability 2 ai-roi 2 drift-monitoring 2 ml-monitoring 2 data-unit-tests 2 pipeline-testing 2 python-3 2 etl 2 validation 2 charts 2 python3 2 data-quality-framework 2 data-centric 2 dataengineering 2 mysql 2 data-centric-ai 2 postgres 2 postgresql 2 data-labeling 2 sql 2 performance-monitoring 2 mlops 2 model-performance-management 2 dataops 2 pyspark 2 gtm-server-side 1 salesforce 1 soql 1 charts-generator 1 zendesk 1 zendesk-tickets 1 r-package 1 data-quality-analysis 1 bigquery 1 data-management 1 marketing-automation 1 penguin-datalayer 1 raft-suite 1 tealium 1 ai 1 annota 1 annotation-processing 1 annotation-tool 1 data-labeling-tools 1 datasets 1 named-entity-recognition 1 natural-language-processing 1 ner 1 text-annotation 1 text-annotation-tool 1 apex 1 crm 1 data-cloud 1 data-health 1 profiling-data 1 reports 1 gcp 1 hadoop-hdfs 1