Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: text-datasets

EmilHvitfeldt/textdata

Download, parse, store, and load text datasets instead of storing it in packages

Language: R - Size: 14.3 MB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 73 - Forks: 12

geo-tp/Alpha-Project-Text-Archive Fork of The-Alpha-Project/Alpha-Project-Text-Archive

Compilation of texts from WoW alphas and betas. Used by https://github.com/The-Alpha-Project/Text-Crawler-Website

Language: HTML - Size: 139 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

nevmenandr/nazirov-texts-dataset

Датасет с текстами Р. Г. Назирова

Size: 6.64 MB - Last synced: about 2 months ago - Pushed: about 5 years ago - Stars: 0 - Forks: 0

nuhmanpk/Webtrench

A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code

Language: Python - Size: 51.8 KB - Last synced: 4 days ago - Pushed: 7 months ago - Stars: 20 - Forks: 5

Infinitode/duplipy

DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.

Language: Python - Size: 38.1 KB - Last synced: about 1 month ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

ibraaaa/news-credibility

Size: 142 MB - Last synced: 6 months ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0

tblock/10kGNAD

Ten Thousand German News Articles Dataset for Topic Classification

Language: Python - Size: 10.8 MB - Last synced: 8 months ago - Pushed: over 1 year ago - Stars: 75 - Forks: 13

noisemix/noisemix

NoiseMix - data generation for natural language

Language: Python - Size: 2.18 MB - Last synced: 24 days ago - Pushed: about 6 years ago - Stars: 41 - Forks: 7

YujiSODE/txtStat

the interface for text character analysis.

Language: JavaScript - Size: 260 KB - Last synced: 9 months ago - Pushed: about 7 years ago - Stars: 1 - Forks: 0

ravexina/shakespeare-plays-dataset-scraper

A bash script to scrap shakespeare works from shakespeare.mit.edu + Already scraped plays in txt format

Language: Shell - Size: 1.82 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 3 - Forks: 3

Hsankesara/The-Tweets-of-Wisdom

A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.

Language: Jupyter Notebook - Size: 5.35 MB - Last synced: over 1 year ago - Pushed: over 4 years ago - Stars: 9 - Forks: 2

Pogayo/Luo-News-Dataset

This repo contains LUO corpus for Named Entity Recognition. The text comes from the news domain and was scrapped from Radio Ramogi.

Size: 7.52 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 2 - Forks: 2

SherinBK/Fake-Job-Posting

Data analysis project on Fake job posting dataset using Machine Learning and NLP basics

Language: Jupyter Notebook - Size: 14.1 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

robinreni96/WordDetection-Data-Generator

This python script will generate n pages of text with bbox and its ground truth labels. Also it supports various background colors, fonts etc. Additionally it can export the dataset as tfrecord

Language: Python - Size: 3.71 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

Pogayo/ADH-EN_MT_Dataset

Contains Adhola-English parallel sentences that can be used for Machine Translation.

Language: Jupyter Notebook - Size: 4.61 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0