Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: datasets
nishitpatel01/Data-Science-and-Machine-Learning-Resources
List of Data Science and Machine Learning Resource that I frequently use
Size: 307 KB - Last synced: 3 days ago - Pushed: 8 months ago - Stars: 50 - Forks: 19
mathiasmantelli/awesome-mobile-robotics
Useful links of different content related to AI, Computer Vision, and Robotics.
Size: 961 KB - Last synced: 7 days ago - Pushed: about 1 month ago - Stars: 434 - Forks: 85
anton-bushuiev/PPIRef
Dataset and utilities for working with protein-protein interactions in 3D
Language: Jupyter Notebook - Size: 13.4 MB - Last synced: 11 days ago - Pushed: 12 days ago - Stars: 38 - Forks: 4
r-wenger/land-use-land-cover-datasets
List of datasets and codes for remote sensing LULC applications.
Size: 53.7 KB - Last synced: 11 days ago - Pushed: almost 2 years ago - Stars: 25 - Forks: 3
CentralMolecularZone/DataSets
Central Molecular Zone Data Sets: A list of data sets available on the CMZ
Language: Python - Size: 88.9 KB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 8 - Forks: 8
ipeaGIT/geobr
Easy access to official spatial data sets of Brazil in R and Python
Language: R - Size: 46.4 MB - Last synced: 8 days ago - Pushed: 12 days ago - Stars: 768 - Forks: 117
stdlib-js/datasets-suthaharan-multi-hop-sensor-network
Labeled wireless sensor network data set collected from a multi-hop wireless sensor network deployment using TelosB motes.
Language: JavaScript - Size: 2.06 MB - Last synced: 12 days ago - Pushed: about 1 month ago - Stars: 1 - Forks: 2
kakumarabhishek/Corrected-Skin-Image-Datasets
Data and code for our analysis of DermaMNIST (MedMNIST), HAM10000, and Fitzpatrick17k datasets
Language: Jupyter Notebook - Size: 815 MB - Last synced: 12 days ago - Pushed: 13 days ago - Stars: 1 - Forks: 0
IQTLabs/VOiCES_Toolkit 📦
Scripts and utilities for working with the VOiCES dataset.
Language: Python - Size: 48.8 KB - Last synced: 12 days ago - Pushed: over 2 years ago - Stars: 1 - Forks: 1
thavlik/quake-gameplay-dataset
A dataset of Quake 1 gameplay videos preprocessed for deep learning
Language: Python - Size: 7.11 MB - Last synced: 12 days ago - Pushed: 13 days ago - Stars: 0 - Forks: 0
cqcore/Data-OSINT
You can find links to data acquisition websites.
Size: 52.7 KB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 117 - Forks: 12
eosphoros-ai/DB-GPT-Hub
A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL
Language: Python - Size: 27 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 1,063 - Forks: 147
Daisy-Zhang/Awesome-Deepfakes
A list of datasets, tools, papers and code related to Deepfakes.
Size: 18.6 KB - Last synced: 6 days ago - Pushed: over 2 years ago - Stars: 64 - Forks: 2
praju-1/Data_science_projects
It contains the necessary code, datasets, and documentation to understand, replicate, and build upon the project's findings and methodologies.
Language: Jupyter Notebook - Size: 8.9 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 0 - Forks: 0
satellite-image-deep-learning/datasets
Datasets for deep learning with satellite & aerial imagery
Size: 316 KB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 503 - Forks: 57
WildlifeDatasets/wildlife-datasets
WildlifeDatasets: An open-source toolkit for animal re-identification
Language: Jupyter Notebook - Size: 230 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 42 - Forks: 3
WalkJim197/BrainDT
Incorporate the latest existing brain science databases, toolkits and atlas
Size: 2.29 MB - Last synced: 13 days ago - Pushed: 14 days ago - Stars: 0 - Forks: 0
gulabpatel/Python_Tutorials
Language: Jupyter Notebook - Size: 16 MB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 6 - Forks: 2
PolyAI-LDN/conversational-datasets
Large datasets for conversational AI
Language: Python - Size: 178 KB - Last synced: 12 days ago - Pushed: over 4 years ago - Stars: 1,246 - Forks: 163
yemrekarakas/Datasets
Example Datasets
Size: 96.4 MB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 0 - Forks: 0
lawlesst/baseballdb-datasette
Configuration for publishing the Lahman Baseball Database with datasette
Language: HTML - Size: 10.7 KB - Last synced: 15 days ago - Pushed: almost 3 years ago - Stars: 2 - Forks: 0
JuliaData/DataFramesMeta.jl
Metaprogramming tools for DataFrames
Language: Julia - Size: 1.34 MB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 470 - Forks: 55
jumpingrivers/datasauRus
R Package 📦 Containing the Datasaurus Dozen datasets :bar_chart:
Language: R - Size: 19.2 MB - Last synced: 6 days ago - Pushed: 3 months ago - Stars: 309 - Forks: 46
OpenCSGs/csghub-server
CSGHub Server is the backend server for CSGHub which helps user to manage datasets, model files, codes and more. CSGHub Server是开源大模型资产管理平台CSGHub的服务端部分的开源项目,提供基于REST API的模型和数据集等大模型资产管理功能。欢迎关注反馈和Star⭐️
Language: Go - Size: 889 KB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 25 - Forks: 5
tushar2704/common_datasets
Common-datasets is a GitHub repository dedicated to providing a wide collection of common datasets for practicing and learning data science and machine learning.
Language: Python - Size: 6.41 MB - Last synced: 15 days ago - Pushed: 11 months ago - Stars: 7 - Forks: 0
ruipreis/afp-algorithms
JSON collection comprising 1,020 diagnostic and treatment algorithms from American Family Physician, equipped with a vector search mechanism to identify similar patient profiles.
Language: Python - Size: 5.23 MB - Last synced: 15 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0
jim-schwoebel/voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
Size: 136 KB - Last synced: 15 days ago - Pushed: 2 months ago - Stars: 1,555 - Forks: 218
resource-watch/resource-watch
Resource Watch features hundreds of data sets all in one place on the state of the planet’s resources and citizens. Users can visualize challenges facing people and the planet, from climate change to poverty, water risk to state instability, air pollution to human migration, and more.
Language: JavaScript - Size: 98.9 MB - Last synced: 4 days ago - Pushed: 19 days ago - Stars: 65 - Forks: 26
Farama-Foundation/Minari
A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities
Language: Python - Size: 7.15 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 219 - Forks: 36
tbcgit/omdctk
OMD Curation Toolkit is a python package designed for the download and curation of metadata and fastq files of public omics datasets.
Language: Python - Size: 7.73 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 2 - Forks: 0
jp1924/HF_builders
huggingface datasets의 dataset builder 파일 모와둔 repo
Language: Python - Size: 79.1 KB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0
huggingface/data-is-better-together
Let's build better datasets, together!
Language: Jupyter Notebook - Size: 2.46 MB - Last synced: 14 days ago - Pushed: 20 days ago - Stars: 138 - Forks: 26
adiag321/Data-Science-datasets
This repository contains the datasets that can be used for practice.
Size: 48.8 MB - Last synced: 17 days ago - Pushed: 17 days ago - Stars: 0 - Forks: 0
wongnai/wongnai-corpus
Collection of Wongnai's datasets
Size: 38.7 MB - Last synced: 6 days ago - Pushed: over 4 years ago - Stars: 74 - Forks: 22
Knuckles-Team/report-manager
Manage your reports and datasets
Language: Python - Size: 44.9 KB - Last synced: 16 days ago - Pushed: 17 days ago - Stars: 2 - Forks: 0
explosion/projects
🪐 End-to-end NLP workflows from prototype to production
Language: Python - Size: 18.5 MB - Last synced: 17 days ago - Pushed: about 2 months ago - Stars: 1,249 - Forks: 470
codingonion/awesome-object-detection-and-recognition-datasets
A collection of some awesome public object detection and recognition datasets.
Size: 24.4 KB - Last synced: 3 days ago - Pushed: about 2 months ago - Stars: 40 - Forks: 6
asigalov61/Tegridy-MIDI-Dataset
Tegridy MIDI Dataset for precise and effective Music AI models creation.
Language: Jupyter Notebook - Size: 490 MB - Last synced: 5 days ago - Pushed: about 2 months ago - Stars: 127 - Forks: 11
JuliaData/RData.jl
Read R data files from Julia
Language: Julia - Size: 360 KB - Last synced: 18 days ago - Pushed: 6 months ago - Stars: 61 - Forks: 16
Daniil200707/SynapseCraft
A program that turns pictures into weights and biases for neural networks
Language: Python - Size: 520 KB - Last synced: 17 days ago - Pushed: 18 days ago - Stars: 0 - Forks: 0
waico/SKAB
SKAB - Skoltech Anomaly Benchmark. Time-series data for evaluating Anomaly Detection algorithms.
Language: Jupyter Notebook - Size: 30.8 MB - Last synced: 16 days ago - Pushed: 8 months ago - Stars: 295 - Forks: 52
higgi13425/medicaldata
Data Package for Medical Datasets
Language: R - Size: 22.6 MB - Last synced: 5 days ago - Pushed: 9 months ago - Stars: 40 - Forks: 11
JovianHQ/opendatasets
A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.
Language: Python - Size: 25.9 MB - Last synced: 15 days ago - Pushed: 7 months ago - Stars: 308 - Forks: 141
mickeysjm/awesome-taxonomy
A curated resource for taxonomy research
Size: 83 KB - Last synced: 6 days ago - Pushed: 30 days ago - Stars: 198 - Forks: 30
knmlprz/corona-analysis-1 📦
Language: HTML - Size: 14.5 MB - Last synced: 18 days ago - Pushed: over 2 years ago - Stars: 2 - Forks: 0
toUpperCase78/real-racing-3-vehicles
Datasets and Analyses for All Vehicles in Real Racing 3
Language: Jupyter Notebook - Size: 23.9 MB - Last synced: 19 days ago - Pushed: 19 days ago - Stars: 3 - Forks: 1
sign-language-processing/datasets
TFDS data loaders for sign language datasets.
Language: Python - Size: 5.85 MB - Last synced: 16 days ago - Pushed: about 2 months ago - Stars: 76 - Forks: 20
AlekseyKorshuk/huggingartists
Lyrics generation with GPT2-based Transformer
Language: Jupyter Notebook - Size: 1020 KB - Last synced: 14 days ago - Pushed: almost 2 years ago - Stars: 96 - Forks: 10
globalgov/manyenviron
Data on environmental agreements
Language: R - Size: 472 MB - Last synced: 19 days ago - Pushed: 19 days ago - Stars: 5 - Forks: 1
apicrafter/apicrafter
REST API wrapper for MongoDB databases
Language: Python - Size: 148 KB - Last synced: 19 days ago - Pushed: 19 days ago - Stars: 0 - Forks: 0
Safe-DS/Datasets
Ready-to-use datasets for the Safe-DS Python library.
Language: Python - Size: 2.22 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 2 - Forks: 0
mgbilby/SRC-OA
Scripture Restoration Collective (Open)
Size: 4.97 MB - Last synced: 19 days ago - Pushed: 20 days ago - Stars: 3 - Forks: 1
jim-schwoebel/download_audioset
📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).
Language: Python - Size: 154 MB - Last synced: 15 days ago - Pushed: 10 months ago - Stars: 95 - Forks: 22
ksopyla/awesome-nlp-polish
A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.
Size: 186 KB - Last synced: 4 days ago - Pushed: almost 3 years ago - Stars: 279 - Forks: 34
mims-harvard/TDC
Therapeutics Commons: Artificial Intelligence Foundation for Therapeutic Science
Language: Jupyter Notebook - Size: 67.6 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 930 - Forks: 167
cihai/cihai
Python library for CJK (Chinese, Japanese, and Korean) language dictionary
Language: Python - Size: 2.16 MB - Last synced: 20 days ago - Pushed: 22 days ago - Stars: 78 - Forks: 14
vtuber-plan/olah
Self-hosted huggingface mirror service.
Language: Python - Size: 34.2 KB - Last synced: 12 days ago - Pushed: 5 months ago - Stars: 23 - Forks: 0
CESNET/cesnet-datazoo
CESNET DataZoo: A toolset for large network traffic datasets
Language: Python - Size: 1.2 MB - Last synced: 20 days ago - Pushed: 20 days ago - Stars: 13 - Forks: 1
justinzm/gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Language: Python - Size: 689 KB - Last synced: 19 days ago - Pushed: 8 months ago - Stars: 2,531 - Forks: 383
github/CodeSearchNet 📦
Datasets, tools, and benchmarks for representation learning of code.
Language: Jupyter Notebook - Size: 28.6 MB - Last synced: 18 days ago - Pushed: over 2 years ago - Stars: 2,117 - Forks: 377
multimodal/multimodal
A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"
Language: Python - Size: 2.21 MB - Last synced: 5 days ago - Pushed: about 2 years ago - Stars: 71 - Forks: 7
prabhuomkar/pytorch-cpp
C++ Implementation of PyTorch Tutorials for Everyone
Language: C++ - Size: 482 KB - Last synced: 20 days ago - Pushed: 20 days ago - Stars: 1,837 - Forks: 249
gmberton/deep-visual-geo-localization-benchmark
Official code for CVPR 2022 (Oral) paper "Deep Visual Geo-localization Benchmark"
Language: Python - Size: 49.8 KB - Last synced: 12 days ago - Pushed: 3 months ago - Stars: 156 - Forks: 27
enrique-lozano/F1-World-API
One of the largest open database on Formula 1. A SQLite database and a Node.js API ready to be used with race results, teams, times per lap, pit stops, free-practices and much more!
Language: TypeScript - Size: 15.9 MB - Last synced: 20 days ago - Pushed: 20 days ago - Stars: 1 - Forks: 0
ruanchaves/napolab
A Natural Portuguese Language Benchmark (Napolab) for the evaluation of language models.
Language: Python - Size: 170 KB - Last synced: 4 days ago - Pushed: 3 months ago - Stars: 51 - Forks: 1
joedockrill/jmd_imagescraper
Image scraping library for creating deep learning datasets
Language: Jupyter Notebook - Size: 1.15 MB - Last synced: 21 days ago - Pushed: over 1 year ago - Stars: 31 - Forks: 13
domargan/awesome-dynamic-graphs
A collection of resources on dynamic/streaming/temporal/evolving graph processing systems, databases, data structures, datasets, and related academic and industrial work
Size: 64.5 KB - Last synced: 3 days ago - Pushed: about 1 year ago - Stars: 119 - Forks: 16
ARPSyndicate/bug-bounty-recon-dataset 📦
recon data for public bug bounty programs. due to extreme abuse via automated tools & requests from multiple threat intelligence teams, this project has been archived & moved.
Size: 2.94 GB - Last synced: 5 days ago - Pushed: over 1 year ago - Stars: 201 - Forks: 48
DmitryRyumin/CVPR-2023-24-Papers
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!
Language: Python - Size: 6.03 MB - Last synced: 21 days ago - Pushed: 21 days ago - Stars: 249 - Forks: 18
TotemSmartBus/spadas Fork of lyy1240056777/spadas
This is a spatial dataset discovery system for real-world datasets. We are trying to support multi-model datasets on our platform.
Language: Java - Size: 720 MB - Last synced: 21 days ago - Pushed: 21 days ago - Stars: 0 - Forks: 1
IsmaelMousa/playing-with-finetuning
Practice fine-tuning a pre-trained Transformers model from Hugging Face
Language: Jupyter Notebook - Size: 19.5 KB - Last synced: 21 days ago - Pushed: 21 days ago - Stars: 0 - Forks: 0
Nixtla/datasetsforecast
Datasets for time series forecasting
Language: Jupyter Notebook - Size: 1.16 MB - Last synced: 21 days ago - Pushed: about 1 month ago - Stars: 53 - Forks: 7
NanoCommons/datasets
Overview of archived datasets with an open license
Language: Groovy - Size: 255 KB - Last synced: 21 days ago - Pushed: 21 days ago - Stars: 2 - Forks: 1
zjunlp/Mol-Instructions
[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Language: Python - Size: 16.6 MB - Last synced: 21 days ago - Pushed: 21 days ago - Stars: 186 - Forks: 12
apacha/MusicObjectDetection
Accompanying source code for the journal paper "A Baseline for General Music Object Detection with Deep Learning"
Language: Python - Size: 530 KB - Last synced: 22 days ago - Pushed: 23 days ago - Stars: 10 - Forks: 8
jmsallan/BAdatasets
This package contains datasets to illustrate machine learning algorithms in a Business Analytics (BA) course
Language: R - Size: 20.2 MB - Last synced: 22 days ago - Pushed: 22 days ago - Stars: 0 - Forks: 1
thecml/survival-datasets
Data loader for most common datasets in survival analysis.
Language: Python - Size: 298 KB - Last synced: 17 days ago - Pushed: 11 months ago - Stars: 1 - Forks: 0
VakavicAI/President_Question_Parliament
توئیتهای مربوط به سوال از رئیس جمهور در مجلس ۱۳۹۷
Size: 2.2 MB - Last synced: 23 days ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0
VakavicAI/dataset_UN_Speach_18
توئیتهای مربوط به سخنرانی رئیس جمهور در مجمع عمومی سازمان ملل
Size: 425 KB - Last synced: 23 days ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0
VakavicAI/dataset_tweet_derby971
توئیتهای مربوط به دربی پرسپولیس و استقلال
Size: 961 KB - Last synced: 23 days ago - Pushed: over 5 years ago - Stars: 1 - Forks: 0
VakavicAI/freeland_1
توئیتهای مربوط به اولین همایش فریلند در منطقه آزاد انزلی
Size: 131 KB - Last synced: 23 days ago - Pushed: over 5 years ago - Stars: 2 - Forks: 0
microsoft/torchgeo
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Language: Python - Size: 129 MB - Last synced: 24 days ago - Pushed: 25 days ago - Stars: 2,232 - Forks: 287
crn565/DSUAL
CONJUNTO DE DATASETS APTOS PARA EL NILM DE LA UNIVERSIDAD DE ALMERIA
Language: Python - Size: 416 KB - Last synced: 23 days ago - Pushed: 23 days ago - Stars: 0 - Forks: 0
satellite-image-deep-learning/techniques
Techniques for deep learning with satellite & aerial imagery
Size: 27.7 MB - Last synced: 25 days ago - Pushed: about 1 month ago - Stars: 7,780 - Forks: 1,347
huggingface/dataset-viewer
Lightweight web API for visualizing and exploring any dataset - computer vision, speech, text, and tabular - stored on the Hugging Face Hub
Language: Python - Size: 21.2 MB - Last synced: 23 days ago - Pushed: 24 days ago - Stars: 619 - Forks: 59
machinecurve/extra_keras_datasets
📃🎉 Additional datasets for tensorflow.keras
Language: Python - Size: 2.41 MB - Last synced: 24 days ago - Pushed: over 3 years ago - Stars: 31 - Forks: 3
stdlib-js/datasets-liu-positive-opinion-words-en
A list of positive opinion words.
Language: JavaScript - Size: 403 KB - Last synced: 24 days ago - Pushed: 25 days ago - Stars: 4 - Forks: 0
stdlib-js/datasets-savoy-stopwords-it
A list of Italian stop words.
Language: JavaScript - Size: 319 KB - Last synced: 24 days ago - Pushed: 25 days ago - Stars: 3 - Forks: 0
stdlib-js/datasets-savoy-stopwords-por
A list of Portuguese stop words.
Language: JavaScript - Size: 313 KB - Last synced: 24 days ago - Pushed: 25 days ago - Stars: 3 - Forks: 0
stdlib-js/datasets-stopwords-en
A list of English stop words.
Language: JavaScript - Size: 325 KB - Last synced: 24 days ago - Pushed: 25 days ago - Stars: 4 - Forks: 0
BaranDev/Media-Queries
A collection of CSS media queries implemented for 236 different devices including mobiles, tablets, watches, and laptops. Perfect for developers seeking to create responsive designs that cater to a wide array of screen sizes and resolutions.
Language: CSS - Size: 6.84 KB - Last synced: 23 days ago - Pushed: 24 days ago - Stars: 0 - Forks: 0
vega/vega-datasets
Common repository for example datasets used by Vega-related projects
Language: TypeScript - Size: 9.72 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 243 - Forks: 205
colour-science/colour-datasets
Colour science datasets for use with Colour
Language: Python - Size: 1.07 MB - Last synced: 7 days ago - Pushed: 13 days ago - Stars: 53 - Forks: 11
gbenson/huggingface-datasets Fork of huggingface/datasets
Library for accessing and sharing datasets for audio, computer vision, and natural language processing (NLP) tasks
Size: 81.7 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 0 - Forks: 0
OYE93/Chinese-NLP-Corpus
Collections of Chinese NLP corpus
Language: Python - Size: 7.14 MB - Last synced: 22 days ago - Pushed: over 3 years ago - Stars: 848 - Forks: 207
dkalpakchi/awesome-swedish-nlp
A curated list of resources for natural language processing (NLP) in Swedish
Size: 25.4 KB - Last synced: 7 days ago - Pushed: over 1 year ago - Stars: 19 - Forks: 2
rediscovery-io/remo-python
:rabbit: Python lib for remo - the app for annotations and images management in Computer Vision
Language: Python - Size: 90.6 MB - Last synced: 14 days ago - Pushed: over 3 years ago - Stars: 184 - Forks: 25
Karlheinzniebuhr/the-weather-scraper
A Lightweight Weather Scraper
Language: Python - Size: 502 KB - Last synced: 4 days ago - Pushed: about 2 years ago - Stars: 102 - Forks: 33
ocramz/nlp-data-superglue
Dataset parsers from the SuperGLUE benchmark https://super.gluebenchmark.com/tasks/
Language: Haskell - Size: 3.91 KB - Last synced: 25 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0
vsoch/datasets
open source datasets for machine learning, the dinosaur datasets
Language: HTML - Size: 5.1 MB - Last synced: 25 days ago - Pushed: about 3 years ago - Stars: 4 - Forks: 0