An open API service providing repository metadata for many open source software ecosystems.

Topic: "machine-learning-dataset"

quincyliang/nlp-public-dataset

Chinese, English NER, English-Chinese machine translation dataset. 中英文实体识别数据集,中英文机器翻译数据集, 中文分词数据集

Language: Python - Size: 12.9 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 320 - Forks: 75

urwithajit9/ClaMP

A Malware classifier dataset built with header fields’ values of Portable Executable files

Language: YARA - Size: 1.75 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 86 - Forks: 31

JohannesBuchner/spoken-command-recognition

A large, free audio sample database (10M words pronounced), a test bed for voice activity detection algorithms and for single-syllable word recognition

Language: Python - Size: 63.5 KB - Last synced at: 15 days ago - Pushed at: over 7 years ago - Stars: 69 - Forks: 31

tosiron/jazznet

jazznet dataset of piano patterns for music audio machine learning research

Language: Python - Size: 4.24 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 43 - Forks: 0

reddyprasade/Machine-Learning-Problems-DataSets

We currently maintain 488 data sets as a service to the machine learning community. You may view all data sets through our searchable interface. For a general overview of the Repository, please visit our About page. For information about citing data sets in publications, please read our citation policy. If you wish to donate a data set, please consult our donation policy. For any other questions, feel free to contact the Repository librarians.

Language: Python - Size: 276 MB - Last synced at: 15 days ago - Pushed at: about 4 years ago - Stars: 34 - Forks: 22

FrankFeng-23/SPREAD

SPREAD is a large-scale synthetic dataset for image- and point-cloud- based tasks in forestry.

Language: Python - Size: 8.19 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 32 - Forks: 2

elkorchi/2DGeometricShapesGenerator

2D Geometric shapes generator

Language: Python - Size: 25.4 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 30 - Forks: 10

cvjena/cifair

A duplicate-free variant of the CIFAR test set.

Language: Python - Size: 57.2 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 14 - Forks: 0

lqwk/ucla-dining-dataset

UCLA Dining Hall Menus Dataset

Size: 1.44 MB - Last synced at: about 2 years ago - Pushed at: about 8 years ago - Stars: 11 - Forks: 0

EngineeringSoftware/math-comp-corpus

Corpus of Coq code related to MathComp including several machine-readable representations

Language: Common Lisp - Size: 117 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

ichisadashioko/etlcdb

Extract Japanese characters database.

Language: Jupyter Notebook - Size: 558 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 7 - Forks: 1

ml4py/dataset-iiit-pet

Classification dataset for comparing cats and dogs images

Size: 754 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 6 - Forks: 2

chadsr/marktplaats-scraper

Marktplaats.nl (Dutch Classifieds) Listing Scraper

Language: Python - Size: 494 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4 - Forks: 1

latentcollection/macOS-fontface-scraper

OpenFrameworks program that generates training data from font-faces installed on your Mac.

Language: Makefile - Size: 267 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

krzjoa/Komentarze

Korpus ręcznie sklasyfikowanych komentarzy do uczenia maszynowego (filtrowanie komentarzy obraźliwych)

Language: Python - Size: 1.08 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

screddy1313/amazon-product-images-downloader

Given a product name, the python program downloads all the images. This includes pagenation also.

Language: Jupyter Notebook - Size: 778 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 3

vtalpaert/pytorch-geometric-visual-task

Simple task for mixed image-graph data

Language: Python - Size: 884 KB - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 0

jay-johnson/network-pipeline-datasets

CSV datasets for ML/AI models from captured network traffic during ZAP scanning with web applications like Django, Flask, React, Vue and Spring - Anti-Nex training datasets

Size: 3.39 MB - Last synced at: 18 days ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 0

aalekhpatel07/captcha-generator

Generate captchas for ML tasks in parallel.

Language: Rust - Size: 24.4 KB - Last synced at: 7 days ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

Elijas/sentence-polarity-dataset-v1.0

sentence polarity dataset v1.0 (includes sentence polarity dataset README v1.0): 5331 positive and 5331 negative processed sentences / snippets. Introduced in Pang/Lee ACL 2005. Released July 2005.

Size: 556 KB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 1

DavidWalz/dlipr

tools for a deep learning in physics research course

Language: Python - Size: 76.2 KB - Last synced at: 12 months ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 4

sferez/BybitMarketData

This repository serves as a collection point for market data from Bybit. Aimed at facilitating machine learning model creation and finetuning.

Size: 11 GB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

aitor-alvarez/MIR-song-dataset-collection

Scripts to create Music Information Retrieval datasets from streaming services for singer identification tasks

Language: Python - Size: 16.6 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

tdude92/reddit-short-stories

4,308 short stories (4 million words) scraped from https://reddit.com/r/WritingPrompts

Size: 15 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

ahundt/costar_dataset

Installable python package for the costar dataset.

Language: Python - Size: 106 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 4

irthomasthomas/Training-Data-For-Instagram-Machine-Learning-Random-Forest-Classifier

Training data used for Instagram-Machine-Learning-Random-Forest-Classifier project

Size: 32.9 MB - Last synced at: 12 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

ichisadashioko/generate_kanji_datasets 📦

Language: Jupyter Notebook - Size: 161 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

mysliwietzflorian/font-awesome-32px-jpg-dataset

Dataset with font awesome 5.1 icons transformed to 32px sized jpg-files

Size: 1.23 MB - Last synced at: 10 days ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

motokimura/spacenet_playground

Experiments to preprocess SpaceNet satellite imagery data corpus to a format that is consumable by machine learning algorithms

Language: Jupyter Notebook - Size: 7.03 MB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 1

ramn51/ML-Practice

Practice on ML algorithms on different problems and data sets.

Language: Python - Size: 59.6 KB - Last synced at: 8 months ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

americast/actions-dataset

Dataset containing videos of a few actions

Size: 98.1 MB - Last synced at: 29 days ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

Related Topics
dataset 10 machine-learning 9 python 4 machine-learning-datasets 3 python3 3 ai 2 music-information-retrieval 2 deep-learning 2 dataset-generation 2 nlp-datasets 2 pytorch 2 ai-data 1 ai-data-collection 1 bybit 1 bybit-websocket 1 crypto 1 crypto-data 1 cryptocurrency 1 cryptocurrency-datasets 1 market 1 market-data 1 trading 1 web-scraping 1 nlp 1 vue2 1 vue 1 spring-boot 1 spring 1 react-redux 1 react 1 owasp 1 network-security 1 network-analysis 1 machine-learning-defense 1 free-datasets 1 flask-restful 1 flask 1 django 1 web-scraper 1 selenium 1 scraper 1 marktplaats 1 dutch-language 1 dataset-creation 1 chromedriver 1 pytorch-geometric 1 graph-neural-networks 1 geometric-deep-learning 1 deep-learning-datasets 1 singer-identification-tasks 1 deep-learning-dataset 1 audio-signal-processing 1 spoken-english 1 audio-classification 1 audio 1 tensorflow 1 robotics 1 costar-dataset 1 uci-machine-learning 1 uci 1 spyder 1 anaconda3 1 javascript 1 trading-strategies 1 trading-bot 1 music-dataset 1 data-generation 1 audio-synthesis 1 text-classification 1 social-network 1 social-media 1 random-forest-classifier 1 random-forest 1 machine-classification 1 instagram 1 json-data 1 corpus-data 1 corpus 1 datasets 1 computer-vision-datasets 1 computer-vision 1 kanji 1 japanese-character-database 1 hiragana 1 handwriting-dataset 1 handwriting 1 polarity-dataset 1 unreal-engine-5 1 synthetic-data 1 3d-point-clouds 1 selenium-python 1 image-downloader-python 1 amazon-automation 1 csv-datasets 1 csv-data 1 font-awesome 1 rust 1 captcha-generator 1 captcha 1 physics 1