Topic: "data-versioning"
dolthub/dolt
Dolt โ Git for Data
Language: Go - Size: 149 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 18,568 - Forks: 552

wandb/wandb
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Language: Python - Size: 169 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 9,769 - Forks: 731

treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
Language: Go - Size: 149 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 4,633 - Forks: 371

quiltdata/quilt
Quilt is a data mesh for connecting people with actionable data
Language: TypeScript - Size: 164 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,334 - Forks: 91

iusztinpaul/energy-forecasting
๐ ๐ง๐ต๐ฒ ๐๐๐น๐น ๐ฆ๐๐ฎ๐ฐ๐ธ ๐ณ-๐ฆ๐๐ฒ๐ฝ๐ ๐ ๐๐ข๐ฝ๐ ๐๐ฟ๐ฎ๐บ๐ฒ๐๐ผ๐ฟ๐ธ | ๐๐ฒ๐ฎ๐ฟ๐ป ๐ ๐๐ & ๐ ๐๐ข๐ฝ๐ for free by designing, building and deploying an end-to-end ML batch system ~ ๐ด๐ฐ๐ถ๐ณ๐ค๐ฆ ๐ค๐ฐ๐ฅ๐ฆ + 2.5 ๐ฉ๐ฐ๐ถ๐ณ๐ด ๐ฐ๐ง ๐ณ๐ฆ๐ข๐ฅ๐ช๐ฏ๐จ & ๐ท๐ช๐ฅ๐ฆ๐ฐ ๐ฎ๐ข๐ต๐ฆ๐ณ๐ช๐ข๐ญ๐ด
Language: Python - Size: 4.1 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 905 - Forks: 206

Renumics/awesome-open-data-centric-ai
Curated list of open source tooling for data-centric AI on unstructured data.
Size: 572 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 716 - Forks: 35

koordinates/kart
Distributed version-control for geospatial and tabular data
Language: Python - Size: 107 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 503 - Forks: 39

BemiHQ/bemi
Automatic data change tracking for PostgreSQL
Language: TypeScript - Size: 3.64 MB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 345 - Forks: 9

RecallGraph/RecallGraph
A versioning data store for time-variant graph data.
Language: JavaScript - Size: 4.32 MB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 340 - Forks: 25

daefresh/awesome-data-temporality
A curated list to help you manage temporal data across many modalities ๐.
Size: 1.87 MB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 111 - Forks: 2

leeper/data-versioning
Collecting thoughts about data versioning
Size: 16.6 KB - Last synced at: about 1 month ago - Pushed at: almost 6 years ago - Stars: 108 - Forks: 8

GitDataAI/jzfs
A Git-like Version Control File System for AI & Data Product Management.
Language: Rust - Size: 3.09 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 105 - Forks: 11

layerai-archive/sdk ๐ฆ
Metadata store for Production ML
Language: Python - Size: 2.22 MB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 88 - Forks: 6

ropensci/gittargets
Data version control for reproducible analysis pipelines in R with {targets}.
Language: R - Size: 1.5 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 80 - Forks: 1

BemiHQ/bemi-prisma
Automatic data change tracking for Prisma
Language: TypeScript - Size: 307 KB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 79 - Forks: 3

wrgl/wrgl
Git-like data versioning.
Language: Go - Size: 3.49 MB - Last synced at: 10 months ago - Pushed at: over 1 year ago - Stars: 40 - Forks: 0

jomariya23156/full-stack-on-prem-cv-mlops
"1 config, 1 command from Jupyter Notebook to serve Millions of users", Full-stack On-Premises MLOps system for Computer Vision from Data versioning to Model monitoring and drift detection.
Language: Jupyter Notebook - Size: 15.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 34 - Forks: 2

BemiHQ/bemi-typeorm
Automatic data change tracking for TypeORM
Language: TypeScript - Size: 80.1 KB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 23 - Forks: 0

aws/amazon-finspace-examples
This repo contains sample code and sample notebooks to illustrate how to work with Amazon FinSpace
Language: Jupyter Notebook - Size: 219 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 21 - Forks: 24

martysai/artificial-text-detection
Python framework for artificial text detection: NLP approaches to compare natural text against generated by neural networks.
Language: Python - Size: 262 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 15 - Forks: 1

pier4all/mongoose-versioned
Document versioning library for MongoDB using the mongoose package.
Language: JavaScript - Size: 527 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 6

d-lowl/bunny-party
A demonstration of how DVC and MLFlow can be used in the task of data relabeling
Language: Python - Size: 25.1 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

datopian/ckanext-versions
A CKAN extension for data versioning.
Language: Python - Size: 346 KB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 8 - Forks: 6

ropensci/butterfly
Verification of continually updating timeseries data where we expect new values, but want to ensure previous data remains unchanged. Maintained by @thomaszwagerman
Language: R - Size: 2.63 MB - Last synced at: 14 days ago - Pushed at: 21 days ago - Stars: 8 - Forks: 0

ksm26/LLMOps
In this course navigates through the LLMOps pipeline, enabling you to preprocess training data for supervised fine-tuning and deploy custom Large Language Models (LLMs).
Language: Jupyter Notebook - Size: 1.98 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 9

BemiHQ/bemi-supabase-js
Automatic data change tracking for Supabase JS
Language: JavaScript - Size: 9.77 KB - Last synced at: 20 days ago - Pushed at: 8 months ago - Stars: 7 - Forks: 0

zensors/droplet
A JSON-based format for working with machine learning data, with a focus on data interoperability.
Size: 1.69 MB - Last synced at: 4 months ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 0

datopian/ckanext-versioning
Deprecated. See https://github.com/datopian/ckanext-versions. โฐ CKAN extension providing data versioning (metadata and files) based on git and github.
Language: Python - Size: 385 KB - Last synced at: 12 months ago - Pushed at: about 4 years ago - Stars: 7 - Forks: 4

data-as-code/dac
Python Data as Code core implementation
Language: Python - Size: 814 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

fair-data-austria/dbrepo ๐ฆ
A Data Preservation Repository Supporting FAIR Principles, Data Versioning and Reproducible Queries
Language: Java - Size: 92.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

BemiHQ/bemi-sqlalchemy
Automatic data change tracking for SQLAlchemy
Language: Python - Size: 8.79 KB - Last synced at: 18 days ago - Pushed at: 8 months ago - Stars: 5 - Forks: 0

dolthub/kedro-dolt
Kedro-Dolt Hook Plugin
Language: Python - Size: 73.2 KB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 2

abeltavares/versioned-data-lakehouse
๐ Git-like Version Control for Data with Nessie, Iceberg, and Spark
Language: Jupyter Notebook - Size: 3.95 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3 - Forks: 2

BemiHQ/bemi-mikro-orm
Automatic data change tracking for MikroORM
Language: TypeScript - Size: 13.7 KB - Last synced at: 22 days ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

KalyanM45/Data-Version-Control-Demo
The provided demo project demonstrates the practical implementation and advantages of using DVC. It showcases how DVC simplifies data versioning and model versioning while working in tandem with Git to create a cohesive version control system tailored for data science projects.
Language: Python - Size: 67.5 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

NewronAI/newron-sdk
Newron is a data-centric ML platform to easily build, manage, deploy and continuously improve models through data driven development.
Language: Python - Size: 1.11 MB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 4

mucozcan/awesome-ml-infra
Articles, tutorials, and tools about creating scalable and sustainable ML/DL systems.
Size: 5.86 KB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 1

VineetKT/ML_fastapi_on_Heroku_CI-CD
Deploying a Machine Learning Model on Heroku with FastAPI using CI/CD tools as GitHub Actions and Heroku Automatic Deployment.
Language: Jupyter Notebook - Size: 4.91 MB - Last synced at: 9 months ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

kletobias/advanced-mlops-lifecycle-hydra-mlflow-optuna-dvc
End-to-end MLOps pipeline showcasing senior-level best practices with Hydra for configuration, MLflow for experiment tracking, Optuna for hyperparameter tuning, and DVC for data/version control. This repository focuses on reproducibility, modular design, and streamlined collaborationโan ideal demonstration of advanced MLOps capabilities.
Language: Python - Size: 680 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

imamaaa/mlops-air-quality-prediction-pipeline
MLOps pipeline for real-time air quality monitoring and pollution prediction. Uses ARIMA & LSTM models, DVC for data versioning, Flask API for deployment, and Prometheus & Grafana for monitoring.
Language: Jupyter Notebook - Size: 1.85 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

Gabigol123456/versioned-data-lakehouse
๐ Git-like Version Control for Data with Nessie, Iceberg, and Spark
Size: 1.95 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

pier4all/data-versioning
Repository for evaluating the different approaches to data versioning
Language: JavaScript - Size: 23.2 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

OElesin/modeldb-aws
Verta ai ModelDB on AWS Cloud with integration into Amazon SageMaker for ML training data versioning and experiment tracking
Language: TypeScript - Size: 392 KB - Last synced at: 12 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

BodieCoding/ml-project-template
A template for building governed and reproducible machine learning projects, enabling transparent tracking of data, models, and deployments across various platforms.
Language: Python - Size: 25.4 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

cs-uche/Car-Prices-Prediction
Advanced Machine Learning Regression: Predicting Car Prices
Language: Jupyter Notebook - Size: 10.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

albagc/auto-data-version
Obtain data versioning tag using ML models
Language: Jupyter Notebook - Size: 8.18 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

pytholic/ClearML
Testing and implementations with ClearML
Language: Python - Size: 5.78 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

lucapug/github_actions_CI_CD
following best practices to productionize an ML project
Language: Jupyter Notebook - Size: 2.33 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

neptune-ai/project-tabular-data-version
Project with tabular data versioned with Artifacts.
Language: Python - Size: 10.7 KB - Last synced at: about 12 hours ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

prathameshThakur/dvc-mlflow-test
DVC + MLflow for data monitoring and ML lifecycle management
Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

lsjsj92/data_version_control
practice about data_version_control(DVC)
Size: 1000 Bytes - Last synced at: 16 days ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

mirrors/dolt
Dolt โ It's Git for Data
Language: Go - Size: 285 MB - Last synced at: over 1 year ago - Stars: 0 - Forks: 0