An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-centric"

ludwig-ai/ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

Language: Python - Size: 31.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 11,457 - Forks: 1,208

lancedb/lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

Language: Rust - Size: 22 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 4,664 - Forks: 301

daochenzha/data-centric-AI

A curated, but incomplete, list of data-centric AI resources.

Size: 1.99 MB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 1,094 - Forks: 78

hkust-nlp/deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]

Language: Python - Size: 240 KB - Last synced at: about 12 hours ago - Pushed at: 6 months ago - Stars: 554 - Forks: 29

encord-team/encord-active

The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.

Language: Python - Size: 264 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 449 - Forks: 26

CLUEbenchmark/DataCLUE

DataCLUE: 数据为中心的NLP基准和工具包

Language: Python - Size: 17.9 MB - Last synced at: 13 days ago - Pushed at: about 3 years ago - Stars: 142 - Forks: 17

s2e-systems/dust-dds

Rust implementation of the Data Distribution Service (DDS)

Language: Rust - Size: 12.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 116 - Forks: 19

PrincetonUniversity/muchiSim

Simulator framework for analysis of performance, energy consumption, area and cost of multi-node multi-chiplet tile-based manycore designs

Language: C++ - Size: 171 MB - Last synced at: 12 days ago - Pushed at: 11 months ago - Stars: 65 - Forks: 10

ChandlerBang/GTrans

[ICLR'23] Implementation of "Empowering Graph Representation Learning with Test-Time Graph Transformation"

Language: Python - Size: 230 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 56 - Forks: 6

USTC-StarTeam/DR4SR

🔥🔥🔥 KDD2024 Best Student Paper

Language: Python - Size: 49.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 48 - Forks: 3

astutic/Acharya

A Data Centric NER annotation tool for your Named Entity Recognition projects

Size: 11.3 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 45 - Forks: 3

zhorton34/vuejs-form

Vue Form with Laravel Inspired Validation and Simply Enjoyable Error Messages Api. (Form Api, Validator Api, Rules Api, Error Messages Api)

Language: JavaScript - Size: 974 KB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 41 - Forks: 6

luo-junyu/Awesome-Data-Efficient-LLM

A list of data-efficient and data-centric LLM (Large Language Model) papers. Our Survey Paper: Towards Efficient LLM Post Training: A Data-centric Perspective

Size: 884 KB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 30 - Forks: 4

Maksims/mr-Observer

An observer is a wrapper over JSON data, that provides an interface to know when data is changed, with a focus on performance and memory efficiency.

Language: JavaScript - Size: 156 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 24 - Forks: 1

kennethleungty/Data-Centric-AI-Competition

Codes for a Top 5% finish in the Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI

Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 22 - Forks: 3

minnesotanlp/infoVerse

Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information"

Language: Python - Size: 8.9 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 16 - Forks: 1

stoney95/pypely

From local functions to cloud deployed pipelines

Language: Python - Size: 20.5 MB - Last synced at: 16 days ago - Pushed at: about 2 years ago - Stars: 16 - Forks: 0

openlayer-ai/openlayer-python

The official Python library for Openlayer, the Continuous Model Improvement Platform for AI. 📈

Language: Python - Size: 21.5 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 10 - Forks: 1

mdbloice/Labeller

Quickly set up an image labelling web application for manually tagging images for machine learning tasks.

Language: Python - Size: 91.8 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 2

rajive/doma

Data-Oriented Microservices Architecture Framework using DDS

Language: Shell - Size: 171 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 6 - Forks: 0

seedatnabeel/Data-SUITE

Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)

Language: Jupyter Notebook - Size: 4.22 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 4

seedatnabeel/Data-IQ

Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)

Language: Jupyter Notebook - Size: 14.1 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 2

3lc-ai/ultralytics Fork of ultralytics/ultralytics

Ultralytics YOLO11 with a 3LC integration

Language: Python - Size: 23.3 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4 - Forks: 0

justincpresley/ndn-hydra

ndn-hydra: A Python-coded NDN distributed repository with five focused attributes: resiliency, scalability, usability, efficiency, and security.

Language: Python - Size: 518 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 4 - Forks: 7

sen-laboratories/sen-core

Haiku server providing the semantic core infrastructure integrated with the Haiku filesystem and file browser Tracker.

Language: C++ - Size: 3.21 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

rticommunity/connextauto-bus

Common Data Architecture : Data Model + Component Interfaces using DDS

Language: CMake - Size: 195 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 1

hexuandeng/DRPruning

Language: Python - Size: 10.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

datacentricorg/datacentric-cpp

Data-centric core services library in C++. For the version supporting multiple languages, see datacentric repo.

Language: C++ - Size: 22.8 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

datacentricorg/datacentric

Data-centric, cross-platform, multi-language core services library for C++, C#, Python, and Java. This repository includes all languages. Each language also has its own repository, e.g. datacentric-cpp.

Language: C# - Size: 22.5 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

datacentricorg/datacentric-cs

Data-centric core services library in C#. For the version supporting multiple languages, see datacentric repo.

Language: C# - Size: 1.31 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

nikimacm/trailmixers-project3

Python and Data Centric Development: A full-stack site that allows users to add, edit, delete and search hiking trails in the Province of Andalucia, Spain. They can also upload photos and maps showing their trails. Each route will provide: A title, Address of the trail , Difficulty level, Description, Directions , Photos, Maps

Language: HTML - Size: 15.5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

rajive/doma-skel

DOMA Skeleton - Document and Setup a DOMA Repository - Clone Me!

Language: Lua - Size: 38.1 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

cadmiumkitty/dcaf-2020-provo

Demo code for my talk at Data-Centric Architecture Forum 2020 about data provenance and PROV ontology.

Language: Java - Size: 48.8 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

datacentricorg/datacentric-py

Data-centric core services library in Python. For the version supporting multiple languages, see datacentric repo.

Language: Python - Size: 568 KB - Last synced at: 2 days ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

bryce-bowles/opioid-prescribing-rates

Semester long project working with Virginia Department of Social Services to assist in data centric reengineer useful data into VA’s major FAACT database. Tableau dashboard analysis and presentation created using data from 2016 to 2019 on Medicare Prescribing rates.

Size: 3.18 MB - Last synced at: 11 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

zenetio/Traffic-Car-Classifier

Use CNN to classify traffic signs

Language: Jupyter Notebook - Size: 123 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

datacentricorg/datacentric-java

Data-centric core services library in Java. For the version supporting multiple languages, see datacentric repo.

Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0