Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: datasets

nishitpatel01/Data-Science-and-Machine-Learning-Resources

List of Data Science and Machine Learning Resource that I frequently use

Size: 307 KB - Last synced: 3 days ago - Pushed: 8 months ago - Stars: 50 - Forks: 19

mathiasmantelli/awesome-mobile-robotics

Useful links of different content related to AI, Computer Vision, and Robotics.

Size: 961 KB - Last synced: 7 days ago - Pushed: about 1 month ago - Stars: 434 - Forks: 85

anton-bushuiev/PPIRef

Dataset and utilities for working with protein-protein interactions in 3D

Language: Jupyter Notebook - Size: 13.4 MB - Last synced: 11 days ago - Pushed: 12 days ago - Stars: 38 - Forks: 4

r-wenger/land-use-land-cover-datasets

List of datasets and codes for remote sensing LULC applications.

Size: 53.7 KB - Last synced: 11 days ago - Pushed: almost 2 years ago - Stars: 25 - Forks: 3

CentralMolecularZone/DataSets

Central Molecular Zone Data Sets: A list of data sets available on the CMZ

Language: Python - Size: 88.9 KB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 8 - Forks: 8

ipeaGIT/geobr

Easy access to official spatial data sets of Brazil in R and Python

Language: R - Size: 46.4 MB - Last synced: 8 days ago - Pushed: 12 days ago - Stars: 768 - Forks: 117

stdlib-js/datasets-suthaharan-multi-hop-sensor-network

Labeled wireless sensor network data set collected from a multi-hop wireless sensor network deployment using TelosB motes.

Language: JavaScript - Size: 2.06 MB - Last synced: 12 days ago - Pushed: about 1 month ago - Stars: 1 - Forks: 2

kakumarabhishek/Corrected-Skin-Image-Datasets

Data and code for our analysis of DermaMNIST (MedMNIST), HAM10000, and Fitzpatrick17k datasets

Language: Jupyter Notebook - Size: 815 MB - Last synced: 12 days ago - Pushed: 13 days ago - Stars: 1 - Forks: 0

IQTLabs/VOiCES_Toolkit 📦

Scripts and utilities for working with the VOiCES dataset.

Language: Python - Size: 48.8 KB - Last synced: 12 days ago - Pushed: over 2 years ago - Stars: 1 - Forks: 1

thavlik/quake-gameplay-dataset

A dataset of Quake 1 gameplay videos preprocessed for deep learning

Language: Python - Size: 7.11 MB - Last synced: 12 days ago - Pushed: 13 days ago - Stars: 0 - Forks: 0

cqcore/Data-OSINT

You can find links to data acquisition websites.

Size: 52.7 KB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 117 - Forks: 12

eosphoros-ai/DB-GPT-Hub

A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL

Language: Python - Size: 27 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 1,063 - Forks: 147

Daisy-Zhang/Awesome-Deepfakes

A list of datasets, tools, papers and code related to Deepfakes.

Size: 18.6 KB - Last synced: 6 days ago - Pushed: over 2 years ago - Stars: 64 - Forks: 2

praju-1/Data_science_projects

It contains the necessary code, datasets, and documentation to understand, replicate, and build upon the project's findings and methodologies.

Language: Jupyter Notebook - Size: 8.9 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 0 - Forks: 0

satellite-image-deep-learning/datasets

Datasets for deep learning with satellite & aerial imagery

Size: 316 KB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 503 - Forks: 57

WildlifeDatasets/wildlife-datasets

WildlifeDatasets: An open-source toolkit for animal re-identification

Language: Jupyter Notebook - Size: 230 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 42 - Forks: 3

WalkJim197/BrainDT

Incorporate the latest existing brain science databases, toolkits and atlas

Size: 2.29 MB - Last synced: 13 days ago - Pushed: 14 days ago - Stars: 0 - Forks: 0

gulabpatel/Python_Tutorials

Language: Jupyter Notebook - Size: 16 MB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 6 - Forks: 2

PolyAI-LDN/conversational-datasets

Large datasets for conversational AI

Language: Python - Size: 178 KB - Last synced: 12 days ago - Pushed: over 4 years ago - Stars: 1,246 - Forks: 163

yemrekarakas/Datasets

Example Datasets

Size: 96.4 MB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 0 - Forks: 0

lawlesst/baseballdb-datasette

Configuration for publishing the Lahman Baseball Database with datasette

Language: HTML - Size: 10.7 KB - Last synced: 15 days ago - Pushed: almost 3 years ago - Stars: 2 - Forks: 0

JuliaData/DataFramesMeta.jl

Metaprogramming tools for DataFrames

Language: Julia - Size: 1.34 MB - Last synced: about 1 month ago - Pushed: about 2 months ago - Stars: 470 - Forks: 55

jumpingrivers/datasauRus

R Package 📦 Containing the Datasaurus Dozen datasets :bar_chart:

Language: R - Size: 19.2 MB - Last synced: 6 days ago - Pushed: 3 months ago - Stars: 309 - Forks: 46

OpenCSGs/csghub-server

CSGHub Server is the backend server for CSGHub which helps user to manage datasets, model files, codes and more. CSGHub Server是开源大模型资产管理平台CSGHub的服务端部分的开源项目,提供基于REST API的模型和数据集等大模型资产管理功能。欢迎关注反馈和Star⭐️

Language: Go - Size: 889 KB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 25 - Forks: 5

tushar2704/common_datasets

Common-datasets is a GitHub repository dedicated to providing a wide collection of common datasets for practicing and learning data science and machine learning.

Language: Python - Size: 6.41 MB - Last synced: 15 days ago - Pushed: 11 months ago - Stars: 7 - Forks: 0

ruipreis/afp-algorithms

JSON collection comprising 1,020 diagnostic and treatment algorithms from American Family Physician, equipped with a vector search mechanism to identify similar patient profiles.

Language: Python - Size: 5.23 MB - Last synced: 15 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0

jim-schwoebel/voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

Size: 136 KB - Last synced: 15 days ago - Pushed: 2 months ago - Stars: 1,555 - Forks: 218

resource-watch/resource-watch

Resource Watch features hundreds of data sets all in one place on the state of the planet’s resources and citizens. Users can visualize challenges facing people and the planet, from climate change to poverty, water risk to state instability, air pollution to human migration, and more.

Language: JavaScript - Size: 98.9 MB - Last synced: 4 days ago - Pushed: 19 days ago - Stars: 65 - Forks: 26

Farama-Foundation/Minari

A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities

Language: Python - Size: 7.15 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 219 - Forks: 36

tbcgit/omdctk

OMD Curation Toolkit is a python package designed for the download and curation of metadata and fastq files of public omics datasets.

Language: Python - Size: 7.73 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 2 - Forks: 0

jp1924/HF_builders

huggingface datasets의 dataset builder 파일 모와둔 repo

Language: Python - Size: 79.1 KB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 0 - Forks: 0

huggingface/data-is-better-together

Let's build better datasets, together!

Language: Jupyter Notebook - Size: 2.46 MB - Last synced: 14 days ago - Pushed: 20 days ago - Stars: 138 - Forks: 26

adiag321/Data-Science-datasets

This repository contains the datasets that can be used for practice.

Size: 48.8 MB - Last synced: 17 days ago - Pushed: 17 days ago - Stars: 0 - Forks: 0

wongnai/wongnai-corpus

Collection of Wongnai's datasets

Size: 38.7 MB - Last synced: 6 days ago - Pushed: over 4 years ago - Stars: 74 - Forks: 22

Knuckles-Team/report-manager

Manage your reports and datasets

Language: Python - Size: 44.9 KB - Last synced: 16 days ago - Pushed: 17 days ago - Stars: 2 - Forks: 0

explosion/projects

🪐 End-to-end NLP workflows from prototype to production

Language: Python - Size: 18.5 MB - Last synced: 17 days ago - Pushed: about 2 months ago - Stars: 1,249 - Forks: 470

codingonion/awesome-object-detection-and-recognition-datasets

A collection of some awesome public object detection and recognition datasets.

Size: 24.4 KB - Last synced: 3 days ago - Pushed: about 2 months ago - Stars: 40 - Forks: 6

asigalov61/Tegridy-MIDI-Dataset

Tegridy MIDI Dataset for precise and effective Music AI models creation.

Language: Jupyter Notebook - Size: 490 MB - Last synced: 5 days ago - Pushed: about 2 months ago - Stars: 127 - Forks: 11

JuliaData/RData.jl

Read R data files from Julia

Language: Julia - Size: 360 KB - Last synced: 18 days ago - Pushed: 6 months ago - Stars: 61 - Forks: 16

Daniil200707/SynapseCraft

A program that turns pictures into weights and biases for neural networks

Language: Python - Size: 520 KB - Last synced: 17 days ago - Pushed: 18 days ago - Stars: 0 - Forks: 0

waico/SKAB

SKAB - Skoltech Anomaly Benchmark. Time-series data for evaluating Anomaly Detection algorithms.

Language: Jupyter Notebook - Size: 30.8 MB - Last synced: 16 days ago - Pushed: 8 months ago - Stars: 295 - Forks: 52

higgi13425/medicaldata

Data Package for Medical Datasets

Language: R - Size: 22.6 MB - Last synced: 5 days ago - Pushed: 9 months ago - Stars: 40 - Forks: 11

JovianHQ/opendatasets

A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.

Language: Python - Size: 25.9 MB - Last synced: 15 days ago - Pushed: 7 months ago - Stars: 308 - Forks: 141

mickeysjm/awesome-taxonomy

A curated resource for taxonomy research

Size: 83 KB - Last synced: 6 days ago - Pushed: 30 days ago - Stars: 198 - Forks: 30

knmlprz/corona-analysis-1 📦

Language: HTML - Size: 14.5 MB - Last synced: 18 days ago - Pushed: over 2 years ago - Stars: 2 - Forks: 0

toUpperCase78/real-racing-3-vehicles

Datasets and Analyses for All Vehicles in Real Racing 3

Language: Jupyter Notebook - Size: 23.9 MB - Last synced: 19 days ago - Pushed: 19 days ago - Stars: 3 - Forks: 1

sign-language-processing/datasets

TFDS data loaders for sign language datasets.

Language: Python - Size: 5.85 MB - Last synced: 16 days ago - Pushed: about 2 months ago - Stars: 76 - Forks: 20

AlekseyKorshuk/huggingartists

Lyrics generation with GPT2-based Transformer

Language: Jupyter Notebook - Size: 1020 KB - Last synced: 14 days ago - Pushed: almost 2 years ago - Stars: 96 - Forks: 10

globalgov/manyenviron

Data on environmental agreements

Language: R - Size: 472 MB - Last synced: 19 days ago - Pushed: 19 days ago - Stars: 5 - Forks: 1

apicrafter/apicrafter

REST API wrapper for MongoDB databases

Language: Python - Size: 148 KB - Last synced: 19 days ago - Pushed: 19 days ago - Stars: 0 - Forks: 0

Safe-DS/Datasets

Ready-to-use datasets for the Safe-DS Python library.

Language: Python - Size: 2.22 MB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 2 - Forks: 0

mgbilby/SRC-OA

Scripture Restoration Collective (Open)

Size: 4.97 MB - Last synced: 19 days ago - Pushed: 20 days ago - Stars: 3 - Forks: 1

jim-schwoebel/download_audioset

📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).

Language: Python - Size: 154 MB - Last synced: 15 days ago - Pushed: 10 months ago - Stars: 95 - Forks: 22

ksopyla/awesome-nlp-polish

A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.

Size: 186 KB - Last synced: 4 days ago - Pushed: almost 3 years ago - Stars: 279 - Forks: 34

mims-harvard/TDC

Therapeutics Commons: Artificial Intelligence Foundation for Therapeutic Science

Language: Jupyter Notebook - Size: 67.6 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 930 - Forks: 167

cihai/cihai

Python library for CJK (Chinese, Japanese, and Korean) language dictionary

Language: Python - Size: 2.16 MB - Last synced: 20 days ago - Pushed: 22 days ago - Stars: 78 - Forks: 14

vtuber-plan/olah

Self-hosted huggingface mirror service.

Language: Python - Size: 34.2 KB - Last synced: 12 days ago - Pushed: 5 months ago - Stars: 23 - Forks: 0

CESNET/cesnet-datazoo

CESNET DataZoo: A toolset for large network traffic datasets

Language: Python - Size: 1.2 MB - Last synced: 20 days ago - Pushed: 20 days ago - Stars: 13 - Forks: 1

justinzm/gopup

数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…

Language: Python - Size: 689 KB - Last synced: 19 days ago - Pushed: 8 months ago - Stars: 2,531 - Forks: 383

github/CodeSearchNet 📦

Datasets, tools, and benchmarks for representation learning of code.

Language: Jupyter Notebook - Size: 28.6 MB - Last synced: 18 days ago - Pushed: over 2 years ago - Stars: 2,117 - Forks: 377

multimodal/multimodal

A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"

Language: Python - Size: 2.21 MB - Last synced: 5 days ago - Pushed: about 2 years ago - Stars: 71 - Forks: 7

prabhuomkar/pytorch-cpp

C++ Implementation of PyTorch Tutorials for Everyone

Language: C++ - Size: 482 KB - Last synced: 20 days ago - Pushed: 20 days ago - Stars: 1,837 - Forks: 249

gmberton/deep-visual-geo-localization-benchmark

Official code for CVPR 2022 (Oral) paper "Deep Visual Geo-localization Benchmark"

Language: Python - Size: 49.8 KB - Last synced: 12 days ago - Pushed: 3 months ago - Stars: 156 - Forks: 27

enrique-lozano/F1-World-API

One of the largest open database on Formula 1. A SQLite database and a Node.js API ready to be used with race results, teams, times per lap, pit stops, free-practices and much more!

Language: TypeScript - Size: 15.9 MB - Last synced: 20 days ago - Pushed: 20 days ago - Stars: 1 - Forks: 0

ruanchaves/napolab

A Natural Portuguese Language Benchmark (Napolab) for the evaluation of language models.

Language: Python - Size: 170 KB - Last synced: 4 days ago - Pushed: 3 months ago - Stars: 51 - Forks: 1

joedockrill/jmd_imagescraper

Image scraping library for creating deep learning datasets

Language: Jupyter Notebook - Size: 1.15 MB - Last synced: 21 days ago - Pushed: over 1 year ago - Stars: 31 - Forks: 13

domargan/awesome-dynamic-graphs

A collection of resources on dynamic/streaming/temporal/evolving graph processing systems, databases, data structures, datasets, and related academic and industrial work

Size: 64.5 KB - Last synced: 3 days ago - Pushed: about 1 year ago - Stars: 119 - Forks: 16

ARPSyndicate/bug-bounty-recon-dataset 📦

recon data for public bug bounty programs. due to extreme abuse via automated tools & requests from multiple threat intelligence teams, this project has been archived & moved.

Size: 2.94 GB - Last synced: 5 days ago - Pushed: over 1 year ago - Stars: 201 - Forks: 48

DmitryRyumin/CVPR-2023-24-Papers

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

Language: Python - Size: 6.03 MB - Last synced: 21 days ago - Pushed: 21 days ago - Stars: 249 - Forks: 18

TotemSmartBus/spadas Fork of lyy1240056777/spadas

This is a spatial dataset discovery system for real-world datasets. We are trying to support multi-model datasets on our platform.

Language: Java - Size: 720 MB - Last synced: 21 days ago - Pushed: 21 days ago - Stars: 0 - Forks: 1

IsmaelMousa/playing-with-finetuning

Practice fine-tuning a pre-trained Transformers model from Hugging Face

Language: Jupyter Notebook - Size: 19.5 KB - Last synced: 21 days ago - Pushed: 21 days ago - Stars: 0 - Forks: 0

Nixtla/datasetsforecast

Datasets for time series forecasting

Language: Jupyter Notebook - Size: 1.16 MB - Last synced: 21 days ago - Pushed: about 1 month ago - Stars: 53 - Forks: 7

NanoCommons/datasets

Overview of archived datasets with an open license

Language: Groovy - Size: 255 KB - Last synced: 21 days ago - Pushed: 21 days ago - Stars: 2 - Forks: 1

zjunlp/Mol-Instructions

[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

Language: Python - Size: 16.6 MB - Last synced: 21 days ago - Pushed: 21 days ago - Stars: 186 - Forks: 12

apacha/MusicObjectDetection

Accompanying source code for the journal paper "A Baseline for General Music Object Detection with Deep Learning"

Language: Python - Size: 530 KB - Last synced: 22 days ago - Pushed: 23 days ago - Stars: 10 - Forks: 8

jmsallan/BAdatasets

This package contains datasets to illustrate machine learning algorithms in a Business Analytics (BA) course

Language: R - Size: 20.2 MB - Last synced: 22 days ago - Pushed: 22 days ago - Stars: 0 - Forks: 1

thecml/survival-datasets

Data loader for most common datasets in survival analysis.

Language: Python - Size: 298 KB - Last synced: 17 days ago - Pushed: 11 months ago - Stars: 1 - Forks: 0

VakavicAI/President_Question_Parliament

توئیت‌های مربوط به سوال از رئیس جمهور در مجلس ۱۳۹۷

Size: 2.2 MB - Last synced: 23 days ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0

VakavicAI/dataset_UN_Speach_18

توئیت‌های مربوط به سخنرانی رئیس جمهور در مجمع عمومی سازمان ملل

Size: 425 KB - Last synced: 23 days ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0

VakavicAI/dataset_tweet_derby971

توئیت‌های مربوط به دربی پرسپولیس و استقلال

Size: 961 KB - Last synced: 23 days ago - Pushed: over 5 years ago - Stars: 1 - Forks: 0

VakavicAI/freeland_1

توئیت‌های مربوط به اولین همایش فریلند در منطقه آزاد انزلی

Size: 131 KB - Last synced: 23 days ago - Pushed: over 5 years ago - Stars: 2 - Forks: 0

microsoft/torchgeo

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data

Language: Python - Size: 129 MB - Last synced: 24 days ago - Pushed: 25 days ago - Stars: 2,232 - Forks: 287

crn565/DSUAL

CONJUNTO DE DATASETS APTOS PARA EL NILM DE LA UNIVERSIDAD DE ALMERIA

Language: Python - Size: 416 KB - Last synced: 23 days ago - Pushed: 23 days ago - Stars: 0 - Forks: 0

satellite-image-deep-learning/techniques

Techniques for deep learning with satellite & aerial imagery

Size: 27.7 MB - Last synced: 25 days ago - Pushed: about 1 month ago - Stars: 7,780 - Forks: 1,347

huggingface/dataset-viewer

Lightweight web API for visualizing and exploring any dataset - computer vision, speech, text, and tabular - stored on the Hugging Face Hub

Language: Python - Size: 21.2 MB - Last synced: 23 days ago - Pushed: 24 days ago - Stars: 619 - Forks: 59

machinecurve/extra_keras_datasets

📃🎉 Additional datasets for tensorflow.keras

Language: Python - Size: 2.41 MB - Last synced: 24 days ago - Pushed: over 3 years ago - Stars: 31 - Forks: 3

stdlib-js/datasets-liu-positive-opinion-words-en

A list of positive opinion words.

Language: JavaScript - Size: 403 KB - Last synced: 24 days ago - Pushed: 25 days ago - Stars: 4 - Forks: 0

stdlib-js/datasets-savoy-stopwords-it

A list of Italian stop words.

Language: JavaScript - Size: 319 KB - Last synced: 24 days ago - Pushed: 25 days ago - Stars: 3 - Forks: 0

stdlib-js/datasets-savoy-stopwords-por

A list of Portuguese stop words.

Language: JavaScript - Size: 313 KB - Last synced: 24 days ago - Pushed: 25 days ago - Stars: 3 - Forks: 0

stdlib-js/datasets-stopwords-en

A list of English stop words.

Language: JavaScript - Size: 325 KB - Last synced: 24 days ago - Pushed: 25 days ago - Stars: 4 - Forks: 0

BaranDev/Media-Queries

A collection of CSS media queries implemented for 236 different devices including mobiles, tablets, watches, and laptops. Perfect for developers seeking to create responsive designs that cater to a wide array of screen sizes and resolutions.

Language: CSS - Size: 6.84 KB - Last synced: 23 days ago - Pushed: 24 days ago - Stars: 0 - Forks: 0

vega/vega-datasets

Common repository for example datasets used by Vega-related projects

Language: TypeScript - Size: 9.72 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 243 - Forks: 205

colour-science/colour-datasets

Colour science datasets for use with Colour

Language: Python - Size: 1.07 MB - Last synced: 7 days ago - Pushed: 13 days ago - Stars: 53 - Forks: 11

gbenson/huggingface-datasets Fork of huggingface/datasets

Library for accessing and sharing datasets for audio, computer vision, and natural language processing (NLP) tasks

Size: 81.7 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 0 - Forks: 0

OYE93/Chinese-NLP-Corpus

Collections of Chinese NLP corpus

Language: Python - Size: 7.14 MB - Last synced: 22 days ago - Pushed: over 3 years ago - Stars: 848 - Forks: 207

dkalpakchi/awesome-swedish-nlp

A curated list of resources for natural language processing (NLP) in Swedish

Size: 25.4 KB - Last synced: 7 days ago - Pushed: over 1 year ago - Stars: 19 - Forks: 2

rediscovery-io/remo-python

:rabbit: Python lib for remo - the app for annotations and images management in Computer Vision

Language: Python - Size: 90.6 MB - Last synced: 14 days ago - Pushed: over 3 years ago - Stars: 184 - Forks: 25

Karlheinzniebuhr/the-weather-scraper

A Lightweight Weather Scraper

Language: Python - Size: 502 KB - Last synced: 4 days ago - Pushed: about 2 years ago - Stars: 102 - Forks: 33

ocramz/nlp-data-superglue

Dataset parsers from the SuperGLUE benchmark https://super.gluebenchmark.com/tasks/

Language: Haskell - Size: 3.91 KB - Last synced: 25 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

vsoch/datasets

open source datasets for machine learning, the dinosaur datasets

Language: HTML - Size: 5.1 MB - Last synced: 25 days ago - Pushed: about 3 years ago - Stars: 4 - Forks: 0