Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-versioning

dolthub/dolt

Dolt – Git for Data

Language: Go - Size: 137 MB - Last synced: about 7 hours ago - Pushed: about 8 hours ago - Stars: 17,101 - Forks: 482

BemiHQ/bemi

Automatic data change tracking for PostgreSQL

Language: TypeScript - Size: 2.65 MB - Last synced: about 14 hours ago - Pushed: about 17 hours ago - Stars: 160 - Forks: 2

aws/amazon-finspace-examples

This repo contains sample code and sample notebooks to illustrate how to work with Amazon FinSpace

Language: Jupyter Notebook - Size: 127 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 21 - Forks: 23

daefresh/awesome-data-temporality

A curated list to help you manage temporal data across many modalities 🚀.

Size: 1.87 MB - Last synced: 1 day ago - Pushed: over 1 year ago - Stars: 99 - Forks: 2

koordinates/kart

Distributed version-control for geospatial and tabular data

Language: Python - Size: 107 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 503 - Forks: 39

Renumics/awesome-open-data-centric-ai

Curated list of open source tooling for data-centric AI on unstructured data.

Size: 572 KB - Last synced: about 21 hours ago - Pushed: 6 months ago - Stars: 679 - Forks: 36

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

Language: Go - Size: 136 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 4,054 - Forks: 328

datopian/ckanext-versions

A CKAN extension for data versioning.

Language: Python - Size: 313 KB - Last synced: 20 days ago - Pushed: 11 months ago - Stars: 8 - Forks: 6

wandb/wandb

🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.

Language: Python - Size: 89.8 MB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 8,194 - Forks: 604

BemiHQ/bemi-prisma

Automatic data change tracking for Prisma

Language: TypeScript - Size: 295 KB - Last synced: 28 days ago - Pushed: about 1 month ago - Stars: 64 - Forks: 0

GitDataAI/jiaozifs

An Git-like version control file system for data lineage & data collaboration.

Language: Go - Size: 1.66 MB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 41 - Forks: 2

quiltdata/quilt

Quilt is a data mesh for connecting people with actionable data

Language: Jupyter Notebook - Size: 228 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1,311 - Forks: 92

iusztinpaul/energy-forecasting

🌀 𝗧𝗵𝗲 𝗙𝘂𝗹𝗹 𝗦𝘁𝗮𝗰𝗸 𝟳-𝗦𝘁𝗲𝗽𝘀 𝗠𝗟𝗢𝗽𝘀 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 | 𝗟𝗲𝗮𝗿𝗻 𝗠𝗟𝗘 & 𝗠𝗟𝗢𝗽𝘀 for free by designing, building and deploying an end-to-end ML batch system ~ 𝘴𝘰𝘶𝘳𝘤𝘦 𝘤𝘰𝘥𝘦 + 2.5 𝘩𝘰𝘶𝘳𝘴 𝘰𝘧 𝘳𝘦𝘢𝘥𝘪𝘯𝘨 & 𝘷𝘪𝘥𝘦𝘰 𝘮𝘢𝘵𝘦𝘳𝘪𝘢𝘭𝘴

Language: Python - Size: 4.1 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 775 - Forks: 174

jomariya23156/full-stack-on-prem-cv-mlops

"1 config, 1 command from Jupyter Notebook to serve Millions of users", Full-stack On-Premises MLOps system for Computer Vision from Data versioning to Model monitoring and drift detection.

Language: Jupyter Notebook - Size: 15.2 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 34 - Forks: 2

pier4all/mongoose-versioned

Document versioning library for MongoDB using the mongoose package.

Language: JavaScript - Size: 527 KB - Last synced: about 1 month ago - Pushed: 5 months ago - Stars: 14 - Forks: 7

BemiHQ/bemi-typeorm

Automatic data change tracking for TypeORM

Language: TypeScript - Size: 73.2 KB - Last synced: 28 days ago - Pushed: 2 months ago - Stars: 19 - Forks: 0

ropensci/gittargets

Data version control for reproducible analysis pipelines in R with {targets}.

Language: R - Size: 1.5 MB - Last synced: about 1 month ago - Pushed: 6 months ago - Stars: 80 - Forks: 1

albagc/auto-data-version

Obtain data versioning tag using ML models

Language: Jupyter Notebook - Size: 8.18 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

ksm26/LLMOps

In this course navigates through the LLMOps pipeline, enabling you to preprocess training data for supervised fine-tuning and deploy custom Large Language Models (LLMs).

Language: Jupyter Notebook - Size: 1.98 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

leeper/data-versioning

Collecting thoughts about data versioning

Size: 16.6 KB - Last synced: 28 days ago - Pushed: almost 5 years ago - Stars: 107 - Forks: 8

RecallGraph/RecallGraph

A versioning data store for time-variant graph data.

Language: JavaScript - Size: 4.31 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 331 - Forks: 25

martysai/artificial-text-detection

Python framework for artificial text detection: NLP approaches to compare natural text against generated by neural networks.

Language: Python - Size: 262 KB - Last synced: 1 day ago - Pushed: 9 months ago - Stars: 14 - Forks: 1

cs-uche/Car-Prices-Prediction

Advanced Machine Learning Regression: Predicting Car Prices

Language: Jupyter Notebook - Size: 10.1 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

KalyanM45/Data-Version-Control-Demo

The provided demo project demonstrates the practical implementation and advantages of using DVC. It showcases how DVC simplifies data versioning and model versioning while working in tandem with Git to create a cohesive version control system tailored for data science projects.

Language: Python - Size: 67.5 MB - Last synced: 19 days ago - Pushed: 10 months ago - Stars: 3 - Forks: 0

layerai-archive/sdk 📦

Metadata store for Production ML

Language: Python - Size: 2.22 MB - Last synced: about 17 hours ago - Pushed: over 1 year ago - Stars: 89 - Forks: 7

mucozcan/awesome-ml-infra

Articles, tutorials, and tools about creating scalable and sustainable ML/DL systems.

Size: 5.86 KB - Last synced: 5 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 1

d-lowl/bunny-party

A demonstration of how DVC and MLFlow can be used in the task of data relabeling

Language: Python - Size: 25.1 MB - Last synced: 5 months ago - Pushed: 8 months ago - Stars: 4 - Forks: 0

data-as-code/dac

Python Data as Code core implementation

Language: Python - Size: 814 KB - Last synced: 2 months ago - Pushed: 3 months ago - Stars: 6 - Forks: 0

wrgl/wrgl

Git-like data versioning.

Language: Go - Size: 3.47 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 38 - Forks: 0

VineetKT/ML_fastapi_on_Heroku_CI-CD

Deploying a Machine Learning Model on Heroku with FastAPI using CI/CD tools as GitHub Actions and Heroku Automatic Deployment.

Language: Jupyter Notebook - Size: 4.91 MB - Last synced: 9 months ago - Pushed: almost 3 years ago - Stars: 2 - Forks: 2

dolthub/kedro-dolt

Kedro-Dolt Hook Plugin

Language: Python - Size: 73.2 KB - Last synced: 20 days ago - Pushed: over 1 year ago - Stars: 4 - Forks: 2

lucapug/github_actions_CI_CD

following best practices to productionize an ML project

Language: Jupyter Notebook - Size: 2.33 MB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

OElesin/modeldb-aws

Verta ai ModelDB on AWS Cloud with integration into Amazon SageMaker for ML training data versioning and experiment tracking

Language: TypeScript - Size: 392 KB - Last synced: 19 days ago - Pushed: about 4 years ago - Stars: 1 - Forks: 0

pytholic/ClearML

Testing and implementations with ClearML

Language: Python - Size: 5.78 MB - Last synced: 23 days ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

pier4all/data-versioning

Repository for evaluating the different approaches to data versioning

Language: JavaScript - Size: 23.2 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 1

lsjsj92/data_version_control

practice about data_version_control(DVC)

Size: 1000 Bytes - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

fair-data-austria/dbrepo 📦

A Data Preservation Repository Supporting FAIR Principles, Data Versioning and Reproducible Queries

Language: Java - Size: 92.3 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 6 - Forks: 0

NewronAI/newron-sdk

Newron is a data-centric ML platform to easily build, manage, deploy and continuously improve models through data driven development.

Language: Python - Size: 1.11 MB - Last synced: 14 days ago - Pushed: over 1 year ago - Stars: 3 - Forks: 4

prathameshThakur/dvc-mlflow-test

DVC + MLflow for data monitoring and ML lifecycle management

Language: Jupyter Notebook - Size: 12.7 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

neptune-ai/project-tabular-data-version

Project with tabular data versioned with Artifacts.

Language: Python - Size: 10.7 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

zensors/droplet

A JSON-based format for working with machine learning data, with a focus on data interoperability.

Size: 1.69 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 7 - Forks: 0

datopian/ckanext-versioning

Deprecated. See https://github.com/datopian/ckanext-versions. ⏰ CKAN extension providing data versioning (metadata and files) based on git and github.

Language: Python - Size: 385 KB - Last synced: 20 days ago - Pushed: about 3 years ago - Stars: 7 - Forks: 4