An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-centric-machine-learning

luo-junyu/Awesome-Data-Efficient-LLM

A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective

Size: 884 KB - Last synced at: 8 days ago - Pushed at: 2 months ago - Stars: 29 - Forks: 4

microsoft/data-centric-satellite-segmentation

Contains implementations of data-centric approaches for improving semantic segmentation on satellite imagery.

Language: Python - Size: 561 KB - Last synced at: 7 days ago - Pushed at: 17 days ago - Stars: 36 - Forks: 1

Docta-ai/docta

A Doctor for your data

Language: Python - Size: 27.8 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 3,098 - Forks: 231

Decentralized-AI-Reserach-Lab/FedNS

Collaboratively Learning Federated Models from Noisy Decentralized Data

Language: Python - Size: 1.14 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

daochenzha/data-centric-AI

A curated, but incomplete, list of data-centric AI resources.

Size: 1.99 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 1,094 - Forks: 78

seedatnabeel/DIPS

You can’t handle the (dirty) truth: Data-centric insights improve pseudo-labeling

Language: Jupyter Notebook - Size: 40.8 MB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 7 - Forks: 1

mashijie1028/TrustDD

Code for our paper "Towards Trustworthy Dataset Distillation" (Pattern Recognition 2025)

Language: Python - Size: 639 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

Nokia-Bell-Labs/data-centric-federated-learning

Enhancing Efficiency in Multidevice Federated Learning through Data Selection

Language: Python - Size: 1.93 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 3

miriamspsantos/dcai-ecai-tutorial-2024

A multi-view panorama of Data-Centric AI: Techniques, Tools, and Applications (ECAI Tutorial 2024)

Size: 1.8 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

sangmichaelxie/doremi

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets

Language: HTML - Size: 24.4 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 226 - Forks: 30

seedatnabeel/TRIAGE

TRIAGE: Characterizing and auditing training data for improved regression (NeurIPS 2023)

Language: Jupyter Notebook - Size: 22.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 3

miriamspsantos/data-typology

Implementation of data typology for imbalanced datasets.

Language: MATLAB - Size: 1.29 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

seedatnabeel/Data-SUITE

Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)

Language: Jupyter Notebook - Size: 4.22 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 4

seedatnabeel/Data-IQ

Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)

Language: Jupyter Notebook - Size: 14.1 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 2

ElsevierSoftwareX/SOFTX-D-21-00177 Fork of parichit/DCEM

Data Clustering using Expectation Maximization algorithm. To cite this Original Software Publication: https://www.sciencedirect.com/science/article/pii/S2352711021001771

Size: 8.6 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0