Topic: "data-centric"
ludwig-ai/ludwig
Low-code framework for building custom LLMs, neural networks, and other AI models
Language: Python - Size: 31.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 11,457 - Forks: 1,208

lancedb/lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
Language: Rust - Size: 22 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 4,664 - Forks: 301

daochenzha/data-centric-AI
A curated, but incomplete, list of data-centric AI resources.
Size: 1.99 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 1,094 - Forks: 78

hkust-nlp/deita
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
Language: Python - Size: 240 KB - Last synced at: about 12 hours ago - Pushed at: 6 months ago - Stars: 554 - Forks: 29

encord-team/encord-active
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Language: Python - Size: 264 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 449 - Forks: 26

CLUEbenchmark/DataCLUE
DataCLUE: 数据为中心的NLP基准和工具包
Language: Python - Size: 17.9 MB - Last synced at: 13 days ago - Pushed at: about 3 years ago - Stars: 142 - Forks: 17

s2e-systems/dust-dds
Rust implementation of the Data Distribution Service (DDS)
Language: Rust - Size: 12.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 116 - Forks: 19

PrincetonUniversity/muchiSim
Simulator framework for analysis of performance, energy consumption, area and cost of multi-node multi-chiplet tile-based manycore designs
Language: C++ - Size: 171 MB - Last synced at: 12 days ago - Pushed at: 11 months ago - Stars: 65 - Forks: 10

ChandlerBang/GTrans
[ICLR'23] Implementation of "Empowering Graph Representation Learning with Test-Time Graph Transformation"
Language: Python - Size: 230 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 56 - Forks: 6

USTC-StarTeam/DR4SR
🔥🔥🔥 KDD2024 Best Student Paper
Language: Python - Size: 49.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 48 - Forks: 3

astutic/Acharya
A Data Centric NER annotation tool for your Named Entity Recognition projects
Size: 11.3 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 45 - Forks: 3

zhorton34/vuejs-form
Vue Form with Laravel Inspired Validation and Simply Enjoyable Error Messages Api. (Form Api, Validator Api, Rules Api, Error Messages Api)
Language: JavaScript - Size: 974 KB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 41 - Forks: 6

luo-junyu/Awesome-Data-Efficient-LLM
A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective
Size: 884 KB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 30 - Forks: 4

Maksims/mr-Observer
An observer is a wrapper over JSON data, that provides an interface to know when data is changed, with a focus on performance and memory efficiency.
Language: JavaScript - Size: 156 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 24 - Forks: 1

kennethleungty/Data-Centric-AI-Competition
Codes for a Top 5% finish in the Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI
Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 22 - Forks: 3

minnesotanlp/infoVerse
Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information"
Language: Python - Size: 8.9 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 16 - Forks: 1

stoney95/pypely
From local functions to cloud deployed pipelines
Language: Python - Size: 20.5 MB - Last synced at: 16 days ago - Pushed at: about 2 years ago - Stars: 16 - Forks: 0

openlayer-ai/openlayer-python
The official Python library for Openlayer, the Continuous Model Improvement Platform for AI. 📈
Language: Python - Size: 21.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 10 - Forks: 1

mdbloice/Labeller
Quickly set up an image labelling web application for manually tagging images for machine learning tasks.
Language: Python - Size: 91.8 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 2

rajive/doma
Data-Oriented Microservices Architecture Framework using DDS
Language: Shell - Size: 171 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 6 - Forks: 0

seedatnabeel/Data-SUITE
Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)
Language: Jupyter Notebook - Size: 4.22 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 4

seedatnabeel/Data-IQ
Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)
Language: Jupyter Notebook - Size: 14.1 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 2

3lc-ai/ultralytics Fork of ultralytics/ultralytics
Ultralytics YOLO11 with a 3LC integration
Language: Python - Size: 23.3 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4 - Forks: 0

justincpresley/ndn-hydra
ndn-hydra: A Python-coded NDN distributed repository with five focused attributes: resiliency, scalability, usability, efficiency, and security.
Language: Python - Size: 518 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 4 - Forks: 7

sen-laboratories/sen-core
Haiku server providing the semantic core infrastructure integrated with the Haiku filesystem and file browser Tracker.
Language: C++ - Size: 3.21 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

rticommunity/connextauto-bus
Common Data Architecture : Data Model + Component Interfaces using DDS
Language: CMake - Size: 195 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 1

hexuandeng/DRPruning
Language: Python - Size: 10.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

datacentricorg/datacentric-cpp
Data-centric core services library in C++. For the version supporting multiple languages, see datacentric repo.
Language: C++ - Size: 22.8 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

datacentricorg/datacentric
Data-centric, cross-platform, multi-language core services library for C++, C#, Python, and Java. This repository includes all languages. Each language also has its own repository, e.g. datacentric-cpp.
Language: C# - Size: 22.5 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

datacentricorg/datacentric-cs
Data-centric core services library in C#. For the version supporting multiple languages, see datacentric repo.
Language: C# - Size: 1.31 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

nikimacm/trailmixers-project3
Python and Data Centric Development: A full-stack site that allows users to add, edit, delete and search hiking trails in the Province of Andalucia, Spain. They can also upload photos and maps showing their trails. Each route will provide: A title, Address of the trail , Difficulty level, Description, Directions , Photos, Maps
Language: HTML - Size: 15.5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

rajive/doma-skel
DOMA Skeleton - Document and Setup a DOMA Repository - Clone Me!
Language: Lua - Size: 38.1 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

cadmiumkitty/dcaf-2020-provo
Demo code for my talk at Data-Centric Architecture Forum 2020 about data provenance and PROV ontology.
Language: Java - Size: 48.8 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

datacentricorg/datacentric-py
Data-centric core services library in Python. For the version supporting multiple languages, see datacentric repo.
Language: Python - Size: 568 KB - Last synced at: 2 days ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

bryce-bowles/opioid-prescribing-rates
Semester long project working with Virginia Department of Social Services to assist in data centric reengineer useful data into VA’s major FAACT database. Tableau dashboard analysis and presentation created using data from 2016 to 2019 on Medicare Prescribing rates.
Size: 3.18 MB - Last synced at: 11 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

zenetio/Traffic-Car-Classifier
Use CNN to classify traffic signs
Language: Jupyter Notebook - Size: 123 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

datacentricorg/datacentric-java
Data-centric core services library in Java. For the version supporting multiple languages, see datacentric repo.
Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0
