Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-quality

sergiomoraes/sergiomoraesblog

On this site I share personal thoughts about data, data governance, data quality, metadata, and side projects.

Language: Jupyter Notebook - Size: 87.1 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

rstudio/pointblank

Data quality assessment and metadata reporting for data frames and database tables

Language: R - Size: 101 MB - Last synced: 8 days ago - Pushed: about 2 months ago - Stars: 826 - Forks: 51

ydataai/ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

Language: Python - Size: 594 MB - Last synced: about 8 hours ago - Pushed: 3 days ago - Stars: 12,082 - Forks: 1,630

nationalparkservice/QCkit

QCkit provides useful functions for data quality control and manipulation including updating data to DarwinCore standards, unit conversions, and data flagging.

Language: R - Size: 1.16 MB - Last synced: about 12 hours ago - Pushed: about 13 hours ago - Stars: 5 - Forks: 6

DataKitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

Language: Python - Size: 206 KB - Last synced: about 4 hours ago - Pushed: about 5 hours ago - Stars: 12 - Forks: 0

dqops/dqo

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.

Language: Java - Size: 71.8 MB - Last synced: about 10 hours ago - Pushed: about 14 hours ago - Stars: 55 - Forks: 12

kestra-io/kestra

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

Language: Java - Size: 34.4 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 6,499 - Forks: 361

JAdelhelm/Automated-Anomaly-Detection-Preprocessing-Pipeline

I created this automated anomaly detection preprocessing pipeline during my master thesis. It can be used to automatically preprocess tabular data for anomaly detection methods.

Language: Python - Size: 19.8 MB - Last synced: about 14 hours ago - Pushed: about 22 hours ago - Stars: 0 - Forks: 0

GIScience/ohsome-dashboard

Web Client for easy access to OSM History and Quality Analyses

Language: JavaScript - Size: 3.79 MB - Last synced: about 23 hours ago - Pushed: 1 day ago - Stars: 4 - Forks: 1

whylabs/whylogs

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

Language: Jupyter Notebook - Size: 166 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 2,556 - Forks: 116

sodadata/soda-core

:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

Language: Python - Size: 2.64 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 1,769 - Forks: 184

cleanlab/cleanlab

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Language: Python - Size: 11.1 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 8,710 - Forks: 670

opendatadiscovery/odd-great-expectations

Integration for collecting metadata from Great Expectations

Language: Python - Size: 689 KB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 2 - Forks: 1

re-data/dbt-re-data

re_data - fix data issues before your users & CEO would discover them 😊

Language: Python - Size: 4.12 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 95 - Forks: 40

opendatadiscovery/awesome-data-catalogs

📙 Awesome Data Catalogs and Observability Platforms.

Size: 73.2 KB - Last synced: 1 day ago - Pushed: 3 days ago - Stars: 591 - Forks: 46

FIIT-IAU/IAU-course

Intelligent Data Analysis (IAU_B) @ FIIT STU in Bratislava

Language: Jupyter Notebook - Size: 59.4 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 12 - Forks: 2

cleanlab/cleanlab-studio

Client interface for all things Cleanlab Studio

Language: Python - Size: 2.88 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 21 - Forks: 4

ucd-dnp/leila

Librería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co

Language: Jupyter Notebook - Size: 29.7 MB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 59 - Forks: 21

mfcabrera/hooqu

hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python

Language: Python - Size: 208 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 25 - Forks: 1

awesome-mlops/awesome-ml-monitoring

A curated list of awesome open source tools and commercial products for monitoring data quality, monitoring model performance, and profiling data 🚀

Size: 4.88 KB - Last synced: 3 days ago - Pushed: 5 months ago - Stars: 48 - Forks: 5

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

Language: Go - Size: 136 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 4,054 - Forks: 328

Fraunhofer-IESE/badgers

Badgers: Bad Data Generators

Language: Python - Size: 9.9 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 8 - Forks: 2

dcnieho/glassesValidator

Tool for automatic determination of data quality (accuracy and precision) of wearable eye tracker recordings

Language: Python - Size: 22.4 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 7 - Forks: 3

hadarsharon/compars

DataFrame comparison done right, powered by Rust with polars (AKA the bear-agnostic 🐻 🐼 🐨 🐻‍❄️ DataFrame comparison library)

Language: Python - Size: 36.1 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0

i3mainz/3dcap-md-gen

Scripts for exporting scanning metadata as described in the publication "Metadata Schema and Ontology for Archaeological Object Documentation including 3D Imaging (AOD-3DI)"

Language: Python - Size: 181 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 2 - Forks: 0

Data-Centric-AI-Community/awesome-data-centric-ai

Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖

Language: Jupyter Notebook - Size: 6.73 MB - Last synced: 2 days ago - Pushed: 5 months ago - Stars: 303 - Forks: 44

canimus/cuallee

Possibly the fastest DataFrame-agnostic quality check library in town.

Language: Python - Size: 1.71 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 109 - Forks: 12

adidas/lakehouse-engine-docs

The Goal of this project is to provide documentation for the Lakehouse Engine framework.

Language: HTML - Size: 4 MB - Last synced: 5 days ago - Pushed: about 1 month ago - Stars: 6 - Forks: 3

sourceduty/Information_Data_Quality

📄 Assess information and data quality in various formats.

Size: 10.7 KB - Last synced: 5 days ago - Pushed: 6 days ago - Stars: 0 - Forks: 0

Swiple/swiple

Swiple enables you to easily observe, understand, validate and improve the quality of your data

Language: Python - Size: 122 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 78 - Forks: 10

isislab-unisa/KGHeartBeat-historical-analysis

History of quality analysis performed by KGHeartBeat

Size: 4.99 GB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 0 - Forks: 0

bitol-io/open-data-contract-standard Fork of jgperrin/data-contract-template

Home of the Open Data Contract Standard (ODCS).

Size: 2.27 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 207 - Forks: 28

Data-Centric-AI-Community/awesome-python-for-data-science

A curated list of awesome resources such as books, tutorials, courses, open-source libraries, exercises, and other materials that support Pythonistas in the making, and Pythonistas migrating into Data Science! 📊

Language: Jupyter Notebook - Size: 53.5 MB - Last synced: 1 day ago - Pushed: 5 months ago - Stars: 68 - Forks: 14

hms-dbmi/EHRtemporalVariability

R package for delineating temporal dataset shifts in Eletronic Health Records

Language: HTML - Size: 11.7 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 14 - Forks: 8

open-metadata/OpenMetadata

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

Language: TypeScript - Size: 1.3 GB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 4,168 - Forks: 837

eugeneyan/applied-ml

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

Size: 388 KB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 25,991 - Forks: 3,538

datafold/data-diff

Compare tables within or across databases

Language: Python - Size: 3.97 MB - Last synced: 8 days ago - Pushed: 13 days ago - Stars: 2,846 - Forks: 206

DataKitchen/dataops-testgen

DataOps TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling,  new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing testing of new data refreshes, & continuous data anomaly monitoring

Language: Python - Size: 4.1 MB - Last synced: 9 days ago - Pushed: 10 days ago - Stars: 9 - Forks: 0

Ashbyt/Python

Ashley Bythell - Python

Language: Jupyter Notebook - Size: 5.68 MB - Last synced: 10 days ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

astronomer/airflow-provider-great-expectations

Great Expectations Airflow operator

Language: Python - Size: 964 KB - Last synced: 8 days ago - Pushed: 10 days ago - Stars: 151 - Forks: 53

re-data/re-data

re_data - fix data issues before your users & CEO would discover them 😊

Language: HTML - Size: 76 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 1,523 - Forks: 119

GClunies/reflekt

Define, govern, and model event data for warehouse-first product analytics.

Language: Python - Size: 5.75 MB - Last synced: 4 days ago - Pushed: 2 months ago - Stars: 75 - Forks: 3

polyaxon/traceml

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

Language: Python - Size: 118 MB - Last synced: 24 days ago - Pushed: 29 days ago - Stars: 492 - Forks: 43

Indexical-Metrics-Measure-Advisory/watchmen

Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, indicator objective analysis and quality management

Language: TypeScript - Size: 21.2 MB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 7 - Forks: 3

daochenzha/data-centric-AI

A curated, but incomplete, list of data-centric AI resources.

Size: 1.97 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 982 - Forks: 67

byteplant/phone-validator-net

NodeJS wrapper for the phone-validator.net API

Language: TypeScript - Size: 347 KB - Last synced: 14 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 1

data-drift/data-drift

Metrics Observability & Troubleshooting

Language: HTML - Size: 11.7 MB - Last synced: 11 days ago - Pushed: 2 months ago - Stars: 299 - Forks: 11

JoanyMarino/RPackages4DQA

Collection of R scripts to test packages in conducting data quality assessments

Language: HTML - Size: 62.8 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 5 - Forks: 2

alemela/TellMeQuality

TellMeQuality is a tool for measuring Data Quality according to ISO/IEC 25024.

Language: HTML - Size: 726 KB - Last synced: 16 days ago - Pushed: about 6 years ago - Stars: 2 - Forks: 0

matgonz/data_quality_analysis

[📚] Analysis developed to Data Government class during my MBA of Big Data and Data Science studies at FIAP. The main objective of this analysis was identify and describe data quality problems.

Language: Jupyter Notebook - Size: 2.13 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0

JoeRegnier/horkos

Data quality analysis and scoring system.

Language: Python - Size: 3.39 MB - Last synced: 17 days ago - Pushed: 10 months ago - Stars: 2 - Forks: 2

SJTU-Quant/awesome-ml-data-quality-papers

Papers about training data quality management for ML models.

Size: 854 KB - Last synced: 30 days ago - Pushed: 30 days ago - Stars: 7 - Forks: 0

scienxlab/redflag

Safety net for machine learning pipelines. Plays nice with sklearn and pandas.

Language: Python - Size: 10.7 MB - Last synced: 17 days ago - Pushed: 18 days ago - Stars: 19 - Forks: 6

slitayem/dbt-practice

Resources and scripts to start with Dbt

Language: Shell - Size: 74.2 KB - Last synced: 17 days ago - Pushed: 18 days ago - Stars: 0 - Forks: 0

feathr-ai/feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise

Language: Scala - Size: 29.4 MB - Last synced: 17 days ago - Pushed: about 1 month ago - Stars: 1,928 - Forks: 256

WeBankFinTech/Qualitis

Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis

Language: Java - Size: 47.6 MB - Last synced: 16 days ago - Pushed: about 2 months ago - Stars: 658 - Forks: 292

EFS-OpenSource/Thetis

Service to examine data processing pipelines (e.g., machine learning or deep learning pipelines) for uncertainty consistency (calibration), fairness, and other safety-relevant aspects.

Language: Python - Size: 1.09 MB - Last synced: 18 days ago - Pushed: about 2 months ago - Stars: 1 - Forks: 1

isislab-unisa/KGHeartbeat

KGHeartBeat is a community-shared open-source knowledge graph quality assessment tool to perform quality analysis on a wide range of freely available knowledge graphs registered on the LOD cloud and DataHub. Web-App: http://www.isislab.it:12280/kgheartbeat/

Language: Python - Size: 170 MB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 1 - Forks: 0

great-expectations/great_expectations_action

A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.

Language: Jupyter Notebook - Size: 25.5 MB - Last synced: 21 days ago - Pushed: over 1 year ago - Stars: 77 - Forks: 11

InfuseAI/piperider

Code review for data in dbt

Language: Python - Size: 32.6 MB - Last synced: 18 days ago - Pushed: about 2 months ago - Stars: 466 - Forks: 21

feast-dev/feast

Feature Store for Machine Learning

Language: Python - Size: 80.7 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 5,244 - Forks: 930

MaastrichtU-IDS/fairsharing-metrics

📊 Fairsharing metrics implementation

Language: Jupyter Notebook - Size: 75.2 KB - Last synced: 22 days ago - Pushed: over 3 years ago - Stars: 2 - Forks: 3

sodadata/soda-github-action

:zap: Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.

Language: Python - Size: 39.1 KB - Last synced: 16 days ago - Pushed: 6 months ago - Stars: 11 - Forks: 0

aai-institute/pyDVL

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation

Language: Python - Size: 303 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 65 - Forks: 9

encord-team/encord-active

The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.

Language: Python - Size: 264 MB - Last synced: 20 days ago - Pushed: 21 days ago - Stars: 420 - Forks: 23

openfoodfacts/nutripatrol

A moderation tool for Open Food Facts

Language: Python - Size: 133 KB - Last synced: 26 days ago - Pushed: about 1 month ago - Stars: 2 - Forks: 1

sodadata/soda-spark

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Language: Python - Size: 118 KB - Last synced: 7 days ago - Pushed: almost 2 years ago - Stars: 61 - Forks: 7

franperic/image_embeddings

Embeddings for Image Deduplication

Language: Python - Size: 1.46 MB - Last synced: 24 days ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Language: Java - Size: 28.1 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 1,104 - Forks: 91

great-expectations/great_expectations

Always know what to expect from your data.

Language: Python - Size: 189 MB - Last synced: 26 days ago - Pushed: 28 days ago - Stars: 9,420 - Forks: 1,464

ms32035/inspector

Source-available data quality tool

Language: Python - Size: 1.28 MB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 1 - Forks: 0

featureform/featureform

The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

Language: Jupyter Notebook - Size: 215 MB - Last synced: 27 days ago - Pushed: 28 days ago - Stars: 1,672 - Forks: 88

voxel51/fiftyone

The open-source tool for building high-quality datasets and computer vision models

Language: Python - Size: 1.29 GB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 6,627 - Forks: 487

GokuMohandas/Made-With-ML

Learn how to design, develop, deploy and iterate on production-grade ML applications.

Language: Jupyter Notebook - Size: 3.82 MB - Last synced: 26 days ago - Pushed: 5 months ago - Stars: 35,580 - Forks: 5,720

jmakeig/data-profile

Sandbox to test out ideas for profiling document data

Language: JavaScript - Size: 122 KB - Last synced: 26 days ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

bairdj/DataDictionaryGenerator

Create data dictionary from EF Core context

Language: C# - Size: 16.6 KB - Last synced: 13 days ago - Pushed: over 2 years ago - Stars: 2 - Forks: 0

openfoodfacts/contributor-quality-issues

Report data quality issues due to contributing apps/users

Size: 1.95 KB - Last synced: 26 days ago - Pushed: 9 months ago - Stars: 1 - Forks: 0

evidentlyai/ml_observability_course

Free Open-source ML observability course for data scientists and ML engineers. Learn how to monitor and debug your ML models in production.

Language: Jupyter Notebook - Size: 25.2 MB - Last synced: 3 days ago - Pushed: 5 months ago - Stars: 56 - Forks: 18

Seddryck/NBi

NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with the possibility to port it easily to other testing frameworks.

Language: C# - Size: 15.8 MB - Last synced: 9 days ago - Pushed: 23 days ago - Stars: 106 - Forks: 37

OHDSI/DataQualityDashboard

A tool to help improve data quality standards in observational data science.

Language: JavaScript - Size: 13.4 MB - Last synced: 26 days ago - Pushed: about 1 month ago - Stars: 118 - Forks: 88

adidas/lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

Language: Python - Size: 3.37 MB - Last synced: 28 days ago - Pushed: 29 days ago - Stars: 180 - Forks: 33

cleanlab/cleanvision

Automatically find issues in image datasets and practice data-centric computer vision.

Language: Python - Size: 2.11 MB - Last synced: 29 days ago - Pushed: 2 months ago - Stars: 917 - Forks: 70

alibaba/feathub

FeatHub - A stream-batch unified feature store for real-time machine learning

Language: Python - Size: 4.56 MB - Last synced: 23 days ago - Pushed: 7 months ago - Stars: 293 - Forks: 46

CDCgov/cdh-lava-react

CDC Data Hub Lifecycle, Analysis & Visualization Accelerator (LAVA) REACT Components based on machine readable requirements.

Language: CSS - Size: 14.2 MB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 9 - Forks: 2

lukasvermeer/srm

This Chrome Extension automatically performs SRM checks and flags potential data quality issues on supported experimentation platforms.

Language: JavaScript - Size: 1.64 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 18 - Forks: 10

aivanzhang/panda_patrol

Language: Python - Size: 33.2 MB - Last synced: 1 day ago - Pushed: 5 months ago - Stars: 21 - Forks: 0

Impetus/jumbune

Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,

Language: Java - Size: 31.7 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 71 - Forks: 32

data-catering/data-caterer Fork of pflooky/data-caterer

Data generation and validation tool for any data source

Language: Scala - Size: 1.77 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 12 - Forks: 2

AKSW/RDFUnit

An RDF Unit Testing Suite

Language: Java - Size: 8.54 MB - Last synced: 2 days ago - Pushed: 8 months ago - Stars: 148 - Forks: 42

annamatias/dataengineer

Códigos, plataformas, ferramentas e processos em alta;

Language: Python - Size: 295 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

BdR76/CSVLint

CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.

Language: C# - Size: 11.6 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 131 - Forks: 7

SteveAnik/Kestra

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

Size: 7.81 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

DP6/templates-centro-de-inovacoes

Modelos de arquiteturas, documentações, testes e deploys para as iniciativas do centro de inovação

Size: 9.42 MB - Last synced: about 2 months ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0

DP6/penguin-datalayer-collect

A data layer quality monitoring and validation module, this solution is part of the Raft Suite ecosystem.

Language: HCL - Size: 2.07 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 19 - Forks: 4

DP6/raft-suite-hub

O Hub é a solução responsável por centralizar a consolidação dos dados no BigQuery, ferramenta escolhida para servir de data warehouse do raft-suite.

Language: JavaScript - Size: 1.62 MB - Last synced: about 2 months ago - Pushed: about 1 year ago - Stars: 7 - Forks: 0

DP6/penguin-document-formatter

A document reader to extract Google Analytics planned events to use on the Raft Suite Data Quality

Language: JavaScript - Size: 938 KB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 7 - Forks: 3

DP6/penguin-datalayer

Crawler assistido para validação de objetos enviados à camada de dados (Data Layer)

Language: JavaScript - Size: 1.01 MB - Last synced: about 2 months ago - Pushed: about 1 year ago - Stars: 7 - Forks: 5

maximiliancw/completely

Measure your data completeness

Language: Python - Size: 12.7 KB - Last synced: 6 days ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

timgent/data-flare

Data quality control tool built on spark and deequ

Language: Scala - Size: 3.37 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 22 - Forks: 10

carson-evans/book-recommendation-knn

This project utilizes the k-nearest neighbors algorithm to power a book recommendation system, providing personalized suggestions based on user rating patterns.

Language: Jupyter Notebook - Size: 38.1 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

Related Keywords
data-quality 280 data-science 62 python 50 data-engineering 48 machine-learning 40 data-quality-checks 36 spark 26 data 25 data-profiling 25 data-quality-monitoring 25 data-analysis 20 data-validation 19 pyspark 19 data-cleaning 18 data-observability 16 sql 16 mlops 16 deep-learning 14 data-governance 13 data-centric-ai 13 dbt 13 dataquality 13 pandas 12 validation 12 data-visualization 12 data-testing 11 etl 11 data-transformation 10 big-data 10 exploratory-data-analysis 10 data-monitoring 9 dataops 9 data-pipeline 9 data-quality-measurement 8 data-reliability 8 metadata 8 data-quality-assessment 8 quality 8 data-management 8 natural-language-processing 8 great-expectations 8 hacktoberfest 8 analytics 8 dataset 7 statistics 7 snowflake 7 r 7 apache-spark 7 ai 7 computer-vision 7 database 6 visualization 6 datasets 6 monitoring 6 docker 6 artificial-intelligence 6 javascript 6 eda 6 etl-pipeline 6 data-centric 5 airflow 5 bigquery 5 pipeline 5 postgresql 5 data-exploration 5 data-curation 5 ml 5 observability 5 databricks 5 data-unit-tests 5 data-wrangling 5 openstreetmap 5 variability 4 data-labeling 4 python3 4 llms 4 data-contracts 4 dbt-packages 4 data-lineage 4 scala 4 feature-engineering 4 data-centric-machine-learning 4 data-warehouse 4 pytorch 4 dp6 4 gtm 4 jupyter-notebook 4 missing-data 4 cleaning 4 feature-store 4 redshift 4 data-discovery 4 data-catalog 4 model-performance 3 java 3 awesome-list 3 image-classification 3 data-quality-monitor 3 data-processing 3 noisy-labels 3