Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: text-datasets
EmilHvitfeldt/textdata
Download, parse, store, and load text datasets instead of storing it in packages
Language: R - Size: 14.3 MB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 73 - Forks: 12
geo-tp/Alpha-Project-Text-Archive Fork of The-Alpha-Project/Alpha-Project-Text-Archive
Compilation of texts from WoW alphas and betas. Used by https://github.com/The-Alpha-Project/Text-Crawler-Website
Language: HTML - Size: 139 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
nevmenandr/nazirov-texts-dataset
Датасет с текстами Р. Г. Назирова
Size: 6.64 MB - Last synced: about 2 months ago - Pushed: about 5 years ago - Stars: 0 - Forks: 0
nuhmanpk/Webtrench
A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code
Language: Python - Size: 51.8 KB - Last synced: 4 days ago - Pushed: 7 months ago - Stars: 20 - Forks: 5
Infinitode/duplipy
DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.
Language: Python - Size: 38.1 KB - Last synced: about 1 month ago - Pushed: 5 months ago - Stars: 0 - Forks: 0
ibraaaa/news-credibility
Size: 142 MB - Last synced: 6 months ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0
tblock/10kGNAD
Ten Thousand German News Articles Dataset for Topic Classification
Language: Python - Size: 10.8 MB - Last synced: 8 months ago - Pushed: over 1 year ago - Stars: 75 - Forks: 13
noisemix/noisemix
NoiseMix - data generation for natural language
Language: Python - Size: 2.18 MB - Last synced: 24 days ago - Pushed: about 6 years ago - Stars: 41 - Forks: 7
YujiSODE/txtStat
the interface for text character analysis.
Language: JavaScript - Size: 260 KB - Last synced: 9 months ago - Pushed: about 7 years ago - Stars: 1 - Forks: 0
ravexina/shakespeare-plays-dataset-scraper
A bash script to scrap shakespeare works from shakespeare.mit.edu + Already scraped plays in txt format
Language: Shell - Size: 1.82 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 3 - Forks: 3
Hsankesara/The-Tweets-of-Wisdom
A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.
Language: Jupyter Notebook - Size: 5.35 MB - Last synced: over 1 year ago - Pushed: over 4 years ago - Stars: 9 - Forks: 2
Pogayo/Luo-News-Dataset
This repo contains LUO corpus for Named Entity Recognition. The text comes from the news domain and was scrapped from Radio Ramogi.
Size: 7.52 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 2 - Forks: 2
SherinBK/Fake-Job-Posting
Data analysis project on Fake job posting dataset using Machine Learning and NLP basics
Language: Jupyter Notebook - Size: 14.1 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
robinreni96/WordDetection-Data-Generator
This python script will generate n pages of text with bbox and its ground truth labels. Also it supports various background colors, fonts etc. Additionally it can export the dataset as tfrecord
Language: Python - Size: 3.71 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
Pogayo/ADH-EN_MT_Dataset
Contains Adhola-English parallel sentences that can be used for Machine Translation.
Language: Jupyter Notebook - Size: 4.61 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0