An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-versioning"

dolthub/dolt

Dolt โ€“ Git for Data

Language: Go - Size: 149 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 18,568 - Forks: 552

wandb/wandb

The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.

Language: Python - Size: 169 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 9,769 - Forks: 731

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

Language: Go - Size: 149 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 4,633 - Forks: 371

quiltdata/quilt

Quilt is a data mesh for connecting people with actionable data

Language: TypeScript - Size: 164 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1,334 - Forks: 91

iusztinpaul/energy-forecasting

๐ŸŒ€ ๐—ง๐—ต๐—ฒ ๐—™๐˜‚๐—น๐—น ๐—ฆ๐˜๐—ฎ๐—ฐ๐—ธ ๐Ÿณ-๐—ฆ๐˜๐—ฒ๐—ฝ๐˜€ ๐— ๐—Ÿ๐—ข๐—ฝ๐˜€ ๐—™๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜„๐—ผ๐—ฟ๐—ธ | ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป ๐— ๐—Ÿ๐—˜ & ๐— ๐—Ÿ๐—ข๐—ฝ๐˜€ for free by designing, building and deploying an end-to-end ML batch system ~ ๐˜ด๐˜ฐ๐˜ถ๐˜ณ๐˜ค๐˜ฆ ๐˜ค๐˜ฐ๐˜ฅ๐˜ฆ + 2.5 ๐˜ฉ๐˜ฐ๐˜ถ๐˜ณ๐˜ด ๐˜ฐ๐˜ง ๐˜ณ๐˜ฆ๐˜ข๐˜ฅ๐˜ช๐˜ฏ๐˜จ & ๐˜ท๐˜ช๐˜ฅ๐˜ฆ๐˜ฐ ๐˜ฎ๐˜ข๐˜ต๐˜ฆ๐˜ณ๐˜ช๐˜ข๐˜ญ๐˜ด

Language: Python - Size: 4.1 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 905 - Forks: 206

Renumics/awesome-open-data-centric-ai

Curated list of open source tooling for data-centric AI on unstructured data.

Size: 572 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 716 - Forks: 35

koordinates/kart

Distributed version-control for geospatial and tabular data

Language: Python - Size: 107 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 503 - Forks: 39

BemiHQ/bemi

Automatic data change tracking for PostgreSQL

Language: TypeScript - Size: 3.64 MB - Last synced at: 16 days ago - Pushed at: about 2 months ago - Stars: 345 - Forks: 9

RecallGraph/RecallGraph

A versioning data store for time-variant graph data.

Language: JavaScript - Size: 4.32 MB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 340 - Forks: 25

daefresh/awesome-data-temporality

A curated list to help you manage temporal data across many modalities ๐Ÿš€.

Size: 1.87 MB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 111 - Forks: 2

leeper/data-versioning

Collecting thoughts about data versioning

Size: 16.6 KB - Last synced at: about 1 month ago - Pushed at: almost 6 years ago - Stars: 108 - Forks: 8

GitDataAI/jzfs

A Git-like Version Control File System for AI & Data Product Management.

Language: Rust - Size: 3.09 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 105 - Forks: 11

layerai-archive/sdk ๐Ÿ“ฆ

Metadata store for Production ML

Language: Python - Size: 2.22 MB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 88 - Forks: 6

ropensci/gittargets

Data version control for reproducible analysis pipelines in R with {targets}.

Language: R - Size: 1.5 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 80 - Forks: 1

BemiHQ/bemi-prisma

Automatic data change tracking for Prisma

Language: TypeScript - Size: 307 KB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 79 - Forks: 3

wrgl/wrgl

Git-like data versioning.

Language: Go - Size: 3.49 MB - Last synced at: 10 months ago - Pushed at: over 1 year ago - Stars: 40 - Forks: 0

jomariya23156/full-stack-on-prem-cv-mlops

"1 config, 1 command from Jupyter Notebook to serve Millions of users", Full-stack On-Premises MLOps system for Computer Vision from Data versioning to Model monitoring and drift detection.

Language: Jupyter Notebook - Size: 15.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 34 - Forks: 2

BemiHQ/bemi-typeorm

Automatic data change tracking for TypeORM

Language: TypeScript - Size: 80.1 KB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 23 - Forks: 0

aws/amazon-finspace-examples

This repo contains sample code and sample notebooks to illustrate how to work with Amazon FinSpace

Language: Jupyter Notebook - Size: 219 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 21 - Forks: 24

martysai/artificial-text-detection

Python framework for artificial text detection: NLP approaches to compare natural text against generated by neural networks.

Language: Python - Size: 262 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 15 - Forks: 1

pier4all/mongoose-versioned

Document versioning library for MongoDB using the mongoose package.

Language: JavaScript - Size: 527 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 6

d-lowl/bunny-party

A demonstration of how DVC and MLFlow can be used in the task of data relabeling

Language: Python - Size: 25.1 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

datopian/ckanext-versions

A CKAN extension for data versioning.

Language: Python - Size: 346 KB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 8 - Forks: 6

ropensci/butterfly

Verification of continually updating timeseries data where we expect new values, but want to ensure previous data remains unchanged. Maintained by @thomaszwagerman

Language: R - Size: 2.63 MB - Last synced at: 14 days ago - Pushed at: 21 days ago - Stars: 8 - Forks: 0

ksm26/LLMOps

In this course navigates through the LLMOps pipeline, enabling you to preprocess training data for supervised fine-tuning and deploy custom Large Language Models (LLMs).

Language: Jupyter Notebook - Size: 1.98 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 9

BemiHQ/bemi-supabase-js

Automatic data change tracking for Supabase JS

Language: JavaScript - Size: 9.77 KB - Last synced at: 20 days ago - Pushed at: 8 months ago - Stars: 7 - Forks: 0

zensors/droplet

A JSON-based format for working with machine learning data, with a focus on data interoperability.

Size: 1.69 MB - Last synced at: 4 months ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 0

datopian/ckanext-versioning

Deprecated. See https://github.com/datopian/ckanext-versions. โฐ CKAN extension providing data versioning (metadata and files) based on git and github.

Language: Python - Size: 385 KB - Last synced at: 12 months ago - Pushed at: about 4 years ago - Stars: 7 - Forks: 4

data-as-code/dac

Python Data as Code core implementation

Language: Python - Size: 814 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

fair-data-austria/dbrepo ๐Ÿ“ฆ

A Data Preservation Repository Supporting FAIR Principles, Data Versioning and Reproducible Queries

Language: Java - Size: 92.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

BemiHQ/bemi-sqlalchemy

Automatic data change tracking for SQLAlchemy

Language: Python - Size: 8.79 KB - Last synced at: 18 days ago - Pushed at: 8 months ago - Stars: 5 - Forks: 0

dolthub/kedro-dolt

Kedro-Dolt Hook Plugin

Language: Python - Size: 73.2 KB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 2

abeltavares/versioned-data-lakehouse

๐ŸŒŠ Git-like Version Control for Data with Nessie, Iceberg, and Spark

Language: Jupyter Notebook - Size: 3.95 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3 - Forks: 2

BemiHQ/bemi-mikro-orm

Automatic data change tracking for MikroORM

Language: TypeScript - Size: 13.7 KB - Last synced at: 22 days ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

KalyanM45/Data-Version-Control-Demo

The provided demo project demonstrates the practical implementation and advantages of using DVC. It showcases how DVC simplifies data versioning and model versioning while working in tandem with Git to create a cohesive version control system tailored for data science projects.

Language: Python - Size: 67.5 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

NewronAI/newron-sdk

Newron is a data-centric ML platform to easily build, manage, deploy and continuously improve models through data driven development.

Language: Python - Size: 1.11 MB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 4

mucozcan/awesome-ml-infra

Articles, tutorials, and tools about creating scalable and sustainable ML/DL systems.

Size: 5.86 KB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 1

VineetKT/ML_fastapi_on_Heroku_CI-CD

Deploying a Machine Learning Model on Heroku with FastAPI using CI/CD tools as GitHub Actions and Heroku Automatic Deployment.

Language: Jupyter Notebook - Size: 4.91 MB - Last synced at: 9 months ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

kletobias/advanced-mlops-lifecycle-hydra-mlflow-optuna-dvc

End-to-end MLOps pipeline showcasing senior-level best practices with Hydra for configuration, MLflow for experiment tracking, Optuna for hyperparameter tuning, and DVC for data/version control. This repository focuses on reproducibility, modular design, and streamlined collaborationโ€”an ideal demonstration of advanced MLOps capabilities.

Language: Python - Size: 680 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

imamaaa/mlops-air-quality-prediction-pipeline

MLOps pipeline for real-time air quality monitoring and pollution prediction. Uses ARIMA & LSTM models, DVC for data versioning, Flask API for deployment, and Prometheus & Grafana for monitoring.

Language: Jupyter Notebook - Size: 1.85 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

Gabigol123456/versioned-data-lakehouse

๐ŸŒŠ Git-like Version Control for Data with Nessie, Iceberg, and Spark

Size: 1.95 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

pier4all/data-versioning

Repository for evaluating the different approaches to data versioning

Language: JavaScript - Size: 23.2 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

OElesin/modeldb-aws

Verta ai ModelDB on AWS Cloud with integration into Amazon SageMaker for ML training data versioning and experiment tracking

Language: TypeScript - Size: 392 KB - Last synced at: 12 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

BodieCoding/ml-project-template

A template for building governed and reproducible machine learning projects, enabling transparent tracking of data, models, and deployments across various platforms.

Language: Python - Size: 25.4 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

cs-uche/Car-Prices-Prediction

Advanced Machine Learning Regression: Predicting Car Prices

Language: Jupyter Notebook - Size: 10.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

albagc/auto-data-version

Obtain data versioning tag using ML models

Language: Jupyter Notebook - Size: 8.18 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

pytholic/ClearML

Testing and implementations with ClearML

Language: Python - Size: 5.78 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

lucapug/github_actions_CI_CD

following best practices to productionize an ML project

Language: Jupyter Notebook - Size: 2.33 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

neptune-ai/project-tabular-data-version

Project with tabular data versioned with Artifacts.

Language: Python - Size: 10.7 KB - Last synced at: about 12 hours ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

prathameshThakur/dvc-mlflow-test

DVC + MLflow for data monitoring and ML lifecycle management

Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

lsjsj92/data_version_control

practice about data_version_control(DVC)

Size: 1000 Bytes - Last synced at: 16 days ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

mirrors/dolt

Dolt โ€“ It's Git for Data

Language: Go - Size: 285 MB - Last synced at: over 1 year ago - Stars: 0 - Forks: 0