GitHub topics: nlp-dataset
amirshnll/Persian-Swear-Words
Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
Language: C# - Size: 1.44 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 300 - Forks: 34

Koziev/Rifma
Dataset with annotation of Russian-language poems
Size: 1.7 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

Koziev/Translations
Parallel Literary Corpora: Fiction and Poetry Translations
Size: 17.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

machinelearningZH/zix_understandability-index
Get a pragmatic assessment how understandable a German text is.
Language: Jupyter Notebook - Size: 11 MB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

AndyTheFactory/romanian-nlp-datasets
A list of Romanian NLP Datasets
Size: 190 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 39 - Forks: 7

DwendwenHappy/Chumor
Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba
Language: JavaScript - Size: 9.41 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

semnan-university-ai/persian-news-dataset 📦
Persian News Dataset
Size: 366 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

semnan-university-ai/persian-slang 📦
Persian Slang Words (dataset)
Size: 9.77 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

Dia-Bete/PersonaBasedCorpus
Repository for the LREC-COLING 2024 Paper: Persona-Based Corpus in the Diabetes Mellitus Domain – Applying a Human-Centered Approach to a Low-Resource Context
Size: 445 KB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

afrisenti-semeval/afrisent-semeval-2023
AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/
Language: Jupyter Notebook - Size: 33 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 38 - Forks: 38

amazon-science/webie
Dataset for web-scaled information extraction.
Language: Python - Size: 216 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 1

K-RLange/SpeakGer
A meta enriched data set of German parliamental debates covering 74 years of plenary protocols.
Language: Python - Size: 1.67 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

semnan-university-ai/persian-sms-dataset 📦
Persian sms dataset
Size: 103 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0
