GitHub topics: datacuration
chapmanjacobd/library
99+ CLI tools to build, browse, and blend your media library
Language: Python - Size: 184 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 434 - Forks: 13

NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
Language: Python - Size: 10.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,031 - Forks: 151

GaloRomero/pepadbPosgreScript
PostgreSQL code for archaeological data management
Language: SQL - Size: 14.4 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

WDscholia/scholia
Wikidata-based scholarly profiles
Language: JavaScript - Size: 5.28 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 235 - Forks: 83

data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
Language: HTML - Size: 223 MB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 731 - Forks: 204

kosson/sva21
Acest repo conține materiale, seturi de date și soluții care au fost folosite în cadrul Școlii de vară Astra, prima ediție, 2021
Size: 3.57 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

purvasingh96/Data-Collection-for-CarZam
An image + data web scraper build to crawl the CarMax website and store relevant information for vehicle identification projects.
Language: Python - Size: 70.4 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

benjaminocampo/DataCuration
Exploration and data curation of a dataset given by a Kaggle competition (https://www.kaggle.com/dansbecker/melbourne-housing-snapshot) related to properties that were sold in Melbourne in 2016 and 2017. The meaning of this project is to prepare a well-structured matrix, so it can be used to run a model in order to estimate their prices.
Language: Jupyter Notebook - Size: 14.2 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 2
