An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: nlp-dataset

amirshnll/Persian-Swear-Words

Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها

Language: C# - Size: 1.44 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 300 - Forks: 34

Koziev/Rifma

Dataset with annotation of Russian-language poems

Size: 1.7 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

Koziev/Translations

Parallel Literary Corpora: Fiction and Poetry Translations

Size: 17.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

machinelearningZH/zix_understandability-index

Get a pragmatic assessment how understandable a German text is.

Language: Jupyter Notebook - Size: 11 MB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

AndyTheFactory/romanian-nlp-datasets

A list of Romanian NLP Datasets

Size: 190 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 39 - Forks: 7

DwendwenHappy/Chumor

Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

Language: JavaScript - Size: 9.41 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

semnan-university-ai/persian-news-dataset 📦

Persian News Dataset

Size: 366 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

semnan-university-ai/persian-slang 📦

Persian Slang Words (dataset)

Size: 9.77 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

Dia-Bete/PersonaBasedCorpus

Repository for the LREC-COLING 2024 Paper: Persona-Based Corpus in the Diabetes Mellitus Domain – Applying a Human-Centered Approach to a Low-Resource Context

Size: 445 KB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

afrisenti-semeval/afrisent-semeval-2023

AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/

Language: Jupyter Notebook - Size: 33 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 38 - Forks: 38

amazon-science/webie

Dataset for web-scaled information extraction.

Language: Python - Size: 216 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 1

K-RLange/SpeakGer

A meta enriched data set of German parliamental debates covering 74 years of plenary protocols.

Language: Python - Size: 1.67 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

semnan-university-ai/persian-sms-dataset 📦

Persian sms dataset

Size: 103 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0