An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-testing

sodadata/soda-core

:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

Language: Python - Size: 3.78 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 2,064 - Forks: 231

andrjas/data_check

data and pipeline testing with and for SQL

Language: Python - Size: 4.33 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 5 - Forks: 0

akmalsoliev/Validoopsie

A simple and easy to use Data Validation library for Python.

Language: Python - Size: 1.77 MB - Last synced at: 1 day ago - Pushed at: 7 days ago - Stars: 60 - Forks: 0

posit-dev/pointblank

Find out if your data is what you think it is

Language: Python - Size: 46.7 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 109 - Forks: 10

DataKitchen/dataops-testgen

DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling,  new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring

Language: Python - Size: 5.22 MB - Last synced at: 15 days ago - Pushed at: 18 days ago - Stars: 55 - Forks: 3

re-data/re-data

re_data - fix data issues before your users & CEO would discover them 😊

Language: HTML - Size: 76.5 MB - Last synced at: 2 days ago - Pushed at: 12 months ago - Stars: 1,563 - Forks: 125

astronomer/airflow-provider-great-expectations

Great Expectations Airflow operator

Language: Python - Size: 1.77 MB - Last synced at: 8 days ago - Pushed at: 13 days ago - Stars: 162 - Forks: 57

InfuseAI/piperider

Code review for data in dbt

Language: Python - Size: 32.6 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 487 - Forks: 23

re-data/dbt-re-data

re_data - fix data issues before your users & CEO would discover them 😊

Language: Python - Size: 4.12 MB - Last synced at: 2 days ago - Pushed at: 12 months ago - Stars: 98 - Forks: 42

data-catering/data-caterer-example Fork of pflooky/data-caterer-example

Example API implementation for Data Caterer

Language: Scala - Size: 2.08 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 0

shridhar1504/Sales-Forecasting-Datascience-Project

Develop a data science project using historical sales data to build a regression model that accurately predicts future sales. Preprocess the dataset, conduct exploratory analysis, select relevant features, and employ regression algorithms for model development. Evaluate model performance, optimize hyperparameters, and provide actionable insights.

Language: Jupyter Notebook - Size: 1.48 MB - Last synced at: 15 days ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 5

RemoYukoff/aqueductus

A data testing framework that executes queries on configurable data providers and validates the results with customizable YAML-defined assertions. Ensure data integrity, consistency, and reliability effortlessly.

Language: Python - Size: 12.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

LukaszLapaj/software-testing-resource-pack

Various files useful for manual testing and test automation etc.

Size: 95.6 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 179 - Forks: 42

pflooky/data-caterer

Data generation and validation tool for any data source

Language: Scala - Size: 331 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 6

neonexus/fixted Fork of bredikhin/barrels

Simple DB Fixtures for Sails.js v1 (fake data for testing).

Language: JavaScript - Size: 480 KB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

blleshi/Credit_Risk_Classification

Credit Risk Classification

Language: Jupyter Notebook - Size: 901 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

pflooky/data-caterer-example

Example API implementation for Data Caterer

Language: Scala - Size: 1.83 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 2

data-catering/data-caterer Fork of pflooky/data-caterer

Data generation and validation tool for any data source

Language: Scala - Size: 1.77 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 2

sodadata/soda-spark

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Language: Python - Size: 118 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 63 - Forks: 8

afairless/kalman_filter

Translating between two sets of notation for Kalman filters

Language: HTML - Size: 319 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

ericmjl/software-testing-open-source-and-data-science

Software Testing in Open Source and Data Science: A talk delivered at the Data Umbrella speaker series

Size: 1.73 MB - Last synced at: 27 days ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

sodadata/soda-github-action

:zap: Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.

Language: Python - Size: 39.1 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 0

serialbandicoot/great-assertions

This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.

Language: Python - Size: 940 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 1

pflooky/data-caterer-docs

Documentation for Data Caterer

Language: HTML - Size: 13.7 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 3

ojasphansekar/Data-Management-Co-op

National Grid ( Python, SQL Server, SSIS, SSRS, Tableau, Power BI, SQL Server Import Export Wizard, Data Validations, Data Integrations, Data Conversions )

Size: 3.21 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

Balajimohan18/Sales-Forecasting-Datascience-Project

Develop a data science project using historical sales data to build a regression model that accurately predicts future sales. Preprocess the dataset, conduct exploratory analysis, select relevant features, and employ regression algorithms for model development. Evaluate model performance, optimize hyperparameters, and provide actionable insights.

Language: Jupyter Notebook - Size: 1.13 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

krsiakdaniel/chrome-extension-show-data-attributes-for-testing

Chrome extensions for developers and testers who want to easily see data attributes for testing directly on the page.

Language: HTML - Size: 197 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

manoj9788/spark-etl-tests

A sample repository showcasing, implementation of testing for ETL pipeline developed with Apache Spark

Size: 1000 Bytes - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

siawayforward/dbt_about_it

I'm learning how to use dbt with BigQuery so I can apply that knowledge wherever we end up working. It seems like a good DWH interface tool to know for data transformation and testing, and allows me to solidify concepts of testing in data ops.

Language: Python - Size: 29.9 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

JayLohokare/pySpark-data-testing-framework

Dynamic data testing engine based on pySpark

Language: Jupyter Notebook - Size: 68.4 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

Related Keywords
data-testing 30 data-quality 14 data-validation 12 python 8 data-engineering 7 data-observability 7 data-science 7 dbt 5 dataquality 5 scala 5 data-reliability 4 data-generation 4 data-monitoring 4 testing 4 sql 3 data-test 3 schema-validation 3 java 3 snowflake 3 data-visualization 3 data-unit-tests 3 data-quality-monitoring 3 data-quality-checks 3 data-profiling 3 data-contracts 3 metadata 2 synthetic-data 2 pyspark 2 data-profiler 2 data-generator 2 databricks 2 supervised-learning 2 sklearn-library 2 testing-automation 2 yaml 2 data-analytics 2 salesforecast 2 forecasting-models 2 regression-algorithms 2 model-evaluation 2 python3 2 predictive-modeling 2 data-quality-testing 2 datatesting 2 pipeline-testing 2 datagenerator 2 quality-assurance 2 dbt-packages 2 software-testing 2 logistic-regression-model 1 logistic-regression 1 loans 1 lending 1 imbalanced-learning 1 data-training 1 credit-risk-classification 1 credit-risk 1 confusion-matrix 1 classification-report 1 control 1 pandas 1 randomoversampler 1 spark 1 soda-sql 1 resampled-data 1 ui 1 target-classification 1 docker 1 docker-compose 1 kubernetes 1 helm 1 data-architecture 1 data-integration 1 data-mapping 1 data-modeling 1 process-flow-diagram 1 forecasting 1 machine-learning 1 scipy 1 chrome-extension 1 cypress 1 data-cy 1 data-qa 1 data-testid 1 etl 1 etl-automation 1 data-transformation 1 azure 1 testing-framework 1 control-systems 1 filter 1 filtering 1 filters 1 kalman-filter 1 math-equations 1 pytest 1 state-space-model 1 state-space-models 1 statistics 1 time-series 1