Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-quality-checks

dqops/dqo

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.

Language: Java - Size: 71.9 MB - Last synced: about 5 hours ago - Pushed: about 5 hours ago - Stars: 56 - Forks: 12

mfcabrera/hooqu

hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python

Language: Python - Size: 208 KB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 25 - Forks: 1

sodadata/soda-core

:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

Language: Python - Size: 2.65 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 1,773 - Forks: 185

AKSW/RDFUnit

An RDF Unit Testing Suite

Language: Java - Size: 8.54 MB - Last synced: about 14 hours ago - Pushed: 8 months ago - Stars: 150 - Forks: 42

canimus/cuallee

Possibly the fastest DataFrame-agnostic quality check library in town.

Language: Python - Size: 1.72 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 110 - Forks: 12

rickyschools/dltflow

A library for authoring DLT pipelines via meta-programming patterns and deploying to Databricks workspaces.

Language: Python - Size: 1.69 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 0 - Forks: 0

sumanthprabhu/DQC-Toolkit

Data quality checks to curate noisy labels in the data

Language: Python - Size: 794 KB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 0 - Forks: 0

re-data/re-data

re_data - fix data issues before your users & CEO would discover them 😊

Language: HTML - Size: 76.5 MB - Last synced: about 15 hours ago - Pushed: 15 days ago - Stars: 1,527 - Forks: 120

Swiple/swiple

Swiple enables you to easily observe, understand, validate and improve the quality of your data

Language: Python - Size: 122 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 78 - Forks: 10

open-metadata/OpenMetadata

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

Language: TypeScript - Size: 1.3 GB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 4,168 - Forks: 837

emmaarenas/data-quality-analysis

collection of Jupyter Notebooks in both English and Spanish, dedicated to performing data quality analysis using the R programming language

Language: HTML - Size: 968 KB - Last synced: 15 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0

polyaxon/traceml

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

Language: Python - Size: 118 MB - Last synced: 29 days ago - Pushed: about 1 month ago - Stars: 492 - Forks: 43

JoanyMarino/RPackages4DQA

Collection of R scripts to test packages in conducting data quality assessments

Language: HTML - Size: 62.8 MB - Last synced: 20 days ago - Pushed: 20 days ago - Stars: 5 - Forks: 2

PovertyAction/high-frequency-checks

A Stata template for running high frequency checks of incoming research data at Innovations for Poverty Action

Language: Stata - Size: 13.2 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 75 - Forks: 52

scienxlab/redflag

Safety net for machine learning pipelines. Plays nice with sklearn and pandas.

Language: Python - Size: 10.7 MB - Last synced: 22 days ago - Pushed: 22 days ago - Stars: 19 - Forks: 6

zqtzt/CODCQC

An open source Python interface to the quality control of ocean in-situ observations

Language: Python - Size: 1.77 MB - Last synced: 17 days ago - Pushed: over 1 year ago - Stars: 6 - Forks: 1

google/data-quality-monitor

Data Quality Monitor (DQM) - Continuously validate your data with easy, customizable rules.

Language: TypeScript - Size: 1 MB - Last synced: 6 days ago - Pushed: 8 days ago - Stars: 26 - Forks: 8

sodadata/soda-github-action

:zap: Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.

Language: Python - Size: 39.1 KB - Last synced: 21 days ago - Pushed: 6 months ago - Stars: 11 - Forks: 0

ms32035/inspector

Source-available data quality tool

Language: Python - Size: 1.28 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

evidentlyai/ml_observability_course

Free Open-source ML observability course for data scientists and ML engineers. Learn how to monitor and debug your ML models in production.

Language: Jupyter Notebook - Size: 25.2 MB - Last synced: 7 days ago - Pushed: 5 months ago - Stars: 56 - Forks: 18

Seddryck/NBi

NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with the possibility to port it easily to other testing frameworks.

Language: C# - Size: 15.8 MB - Last synced: 14 days ago - Pushed: 28 days ago - Stars: 106 - Forks: 37

zy969/streaming-data-quality-validation

Real-time streaming data quality validation project using NYC Taxi Rides datasets, leveraging Kafka, Flink, and StreamDQ.

Language: Java - Size: 88.7 MB - Last synced: 18 days ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

mathewsrc/ETL-Chicago-Cafe-Permits

This ETL (Extract, Transform, Load) project employs several Python libraries, including Airflow, Soda, Polars, YData Profiling, DuckDB, Requests, Loguru, and Google Cloud to streamline the extraction, transformation, and loading of CSV datasets from the U.S. government's data repository at https://catalog.data.gov.

Language: HTML - Size: 42.3 MB - Last synced: 7 days ago - Pushed: 5 months ago - Stars: 3 - Forks: 0

medizininformatik-initiative/kerndatensatzmodul-metadaten-datenqualitaet

Dieses Repository spezifiziert Methoden und Verfahren für Datenqualitätsfragestellungen.

Size: 1000 Bytes - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

DP6/penguin-datalayer

Crawler assistido para validação de objetos enviados à camada de dados (Data Layer)

Language: JavaScript - Size: 1.01 MB - Last synced: about 2 months ago - Pushed: about 1 year ago - Stars: 7 - Forks: 5

qalita-io/packs

Qalita Public Packs

Language: Python - Size: 1.56 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0

ubisoft/mobydq

:whale: Tool to automate data quality checks on data pipelines

Language: Vue - Size: 188 MB - Last synced: 2 months ago - Pushed: over 1 year ago - Stars: 233 - Forks: 56

anilkulkarni87/databricks_notebooks

A collection of Databricks notebooks for testing and learning

Language: HTML - Size: 4.22 MB - Last synced: 7 days ago - Pushed: about 2 years ago - Stars: 3 - Forks: 1

socialpoint-labs/sqlbucket 📦

Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.

Language: Python - Size: 463 KB - Last synced: 12 days ago - Pushed: 4 months ago - Stars: 71 - Forks: 7

BertrandKafando/data-guardian-api

Backend de dataguadian Pro : plateforme de profilage et correction de base de données

Language: Python - Size: 7.55 MB - Last synced: 17 days ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

qalita-io/data-quality-platform

Data quality made simple

Size: 1000 Bytes - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

EvgenyPetrovsky/deeque

Data Quality control framework for dataframes in R

Language: R - Size: 182 KB - Last synced: 5 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

jorge-martinez-gil/dataq

Framework to Automatically Determine the Quality of Open Data Catalogs

Language: Python - Size: 105 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

Hyhyhyhyhyhyh/Django-Data-quality-system

数据治理、数据质量检核/监控平台(Django+jQuery+MySQL)

Language: Python - Size: 19 MB - Last synced: 6 months ago - Pushed: over 1 year ago - Stars: 175 - Forks: 73

PEDSnet/Data-Quality-Analysis

The PEDSnet Data Quality Assessment Toolkit (OMOP CDM)

Language: R - Size: 1.11 MB - Last synced: 23 days ago - Pushed: about 3 years ago - Stars: 24 - Forks: 6

DQCollaborative/MIAD

Minimum Information About Dataset

Size: 19.5 KB - Last synced: 7 months ago - Pushed: almost 7 years ago - Stars: 1 - Forks: 0

niyotham/Data-engineer-end-to-end-project-airflow-dbt-soda-bigquery

An end to end data engineering project for loading data into bigquery with airflow and perform transformations using dbt and data quality check with soday

Language: Python - Size: 7.03 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

Edouard-Legoupil/HighFrequencyChecks Fork of PYannick/HighFrequencyChecks

Perform HFC on data collected through kobo

Language: R - Size: 13.3 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 3

chiru30/KPMG-Data-Analytics-internship

Language: Jupyter Notebook - Size: 1.33 MB - Last synced: 8 months ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

Ezzaldin97/dprofiler

profile tabular datasets, manage automatic validation for new datasets, automatic handling for quality issues.

Language: Python - Size: 459 KB - Last synced: about 1 month ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

manesioz/bq_dq_plugin

Airflow plug-in that allows you to automate robust Data Quality checks for BigQuery

Language: Python - Size: 13.7 KB - Last synced: 9 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

Garett601/data-quality-reports

A function that automatically generates a Data Quality Report for your data

Language: Python - Size: 57.6 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 3 - Forks: 1

LouisdeBruijn/waterfall-logging

a Python package to log (distinct) column counts in a DataFrame, export it as a Markdown table and plot a Waterfall statistics figure.

Language: Python - Size: 577 KB - Last synced: 26 days ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0

prneidhardt/Apache-Data-Pipelines

Sparkify project

Language: Jupyter Notebook - Size: 238 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0

Swiple/swiple-action

Automatically validate datasets, poll task status, and display validation results in a GitHub using Swiple pull request.

Language: Python - Size: 28.3 KB - Last synced: 20 days ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0

christianbors/OpenRefineQualityMetrics

MetricDoc is an interactive visual exploration environment for assessing data quality

Language: JavaScript - Size: 5.83 MB - Last synced: 27 days ago - Pushed: about 4 years ago - Stars: 8 - Forks: 1

baligoyem/dataqtor

🔍Your Data Quality Detector / Gain insight into your data and get it ready for use before you start working with it 💡📊🛠💎

Language: Python - Size: 9.43 MB - Last synced: 12 months ago - Pushed: over 1 year ago - Stars: 14 - Forks: 6

tmilitino/Unicorninha

Projeto de conclusão de curso do CESAR SCHOOL voltado para avaliação de ferramentas de Qualidade de Dados.

Language: Python - Size: 1.03 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0

casualcomputer/sql.mechanic

Functions that generate SQL queries that summarize high-dimensional tables stored in various databases (e.g. Microsoft SQL Servers, Netezza, DB2, Postgres, Oracle, MySQL, etc.).

Language: R - Size: 90.8 KB - Last synced: 3 months ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0

Pawsanie/PySpark_universal_dq_report

The script reads the dataset along the path and selects the columns in it received from the argument for the specified dates. Then it saves the report to the specified path of HDFS.

Language: Python - Size: 25.4 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 2

grandelli/clouddq-samples

Repo that contains data quality sample tasks for Google CloudDQ and Dataplex DQ Tasks

Size: 43.9 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 2 - Forks: 0

JuliaTsymbal/Data_Management_Project

Language: Jupyter Notebook - Size: 69.3 KB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

tknishh/dqa-dbt-databricks

Data Quality Assurance using dbt and databricks combination.

Size: 115 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

bballamudi/goodtables-py Fork of frictionlessdata/framework

Validate tabular data in Python

Size: 1.03 MB - Last synced: 11 months ago - Pushed: about 4 years ago - Stars: 2 - Forks: 0

sleepepi/slice

A clinical research interface geared at collecting robust and consistent data by providing a strong framework for designing data dictionaries and collection forms.

Language: Ruby - Size: 24.2 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 11 - Forks: 6

bballamudi/deequ Fork of awslabs/deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Size: 68.9 MB - Last synced: 11 months ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

jpilli/SQLDataValidationFramework

Using SQL data validation framework you can build a data validation process to validate data against complex data validation rules.

Language: TSQL - Size: 49.8 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

bballamudi/DataGristle Fork of kenfar/DataGristle

Tough and flexible tools for data analysis, transformation, validation and movement.

Size: 14.9 MB - Last synced: 11 months ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

DP6/penguin-datalayer-core

Validation core engine for the data layer of the Raft Suite ecosystem.

Language: JavaScript - Size: 391 KB - Last synced: 11 days ago - Pushed: about 1 year ago - Stars: 6 - Forks: 3

AbdullahMu/Data-Pipelines-with-Airflow

Schedule, automate, and monitor data pipelines using Apache Airflow. Run data quality checks, track data lineage, and work with data pipelines in production.

Language: Python - Size: 52.7 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 1 - Forks: 1

chetnachaudhari/PySpark_Helpers

A library of helpful pyspark functions

Language: Python - Size: 10.7 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

bballamudi/pydqc Fork of SauceCat/pydqc

python automatic data quality check toolkit

Size: 9.04 MB - Last synced: 11 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

bballamudi/mobydq Fork of ubisoft/mobydq

:whale: Tool to automate data quality checks on data pipelines

Size: 188 MB - Last synced: 11 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

garethcmurphy/scilearn

data quality

Language: Python - Size: 303 KB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

Related Keywords
data-quality-checks 64 data-quality 36 data-quality-monitoring 14 python 13 data-science 10 data-profiling 8 data-observability 7 data-quality-assessment 6 data-validation 6 data-quality-report 5 data-analysis 5 data-engineering 5 pyspark 5 pandas 5 dataquality 5 sql 4 data 4 airflow 4 dbt 4 data-quality-measurement 4 data-reliability 4 bigquery 4 data-testing 3 snowflake 3 data-monitoring 3 machine-learning 3 validation 3 data-quality-analysis 3 soda 3 metadata-management 2 etl 2 data-lineage 2 docker 2 docker-compose 2 data-catalog 2 swiple 2 research-tool 2 data-quality-framework 2 research-data-management 2 database 2 data-quality-monitor 2 sql-server 2 spark 2 r 2 statistics 2 data-visualization 2 dataframes 2 google-cloud-platform 2 mlops 2 gcp 2 apache-airflow 2 hacktoberfest 2 pipeline-testing 2 datatesting 2 python3 2 data-pipeline 2 data-warehouse 2 gtm 2 data-unit-tests 2 data-quality-testing 2 data-governance 2 dp6 2 data-contracts 2 tabular-data 2 unit-testing 2 dbt-packages 2 datalayer 2 json-schema 2 airflow-docker 1 healthcare 1 pedsnet 1 big 1 customer-segmentation 1 data-dashboards 1 automated-testing 1 airflow-plugin 1 raft-suite 1 data-integrity 1 data-engineering-workflows 1 etl-framework 1 soda-sql 1 soda-spark 1 great-expectations 1 databricks-notebooks 1 data-driven 1 data-management 1 mdm 1 big-data 1 qalita 1 analytics 1 data-catalog-backend 1 data-catalog-management 1 data-catalogs 1 data-catalogue 1 metadata-information 1 omop 1 python-script 1 clouddq 1 dataplex 1 scraping-websites 1