GitHub topics: nlp-dataset

Repositories

amirshnll/Persian-Swear-Words

Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها

Language: C# - Size: 1.44 MB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 300 - Forks: 34

Koziev/Rifma

Dataset with annotation of Russian-language poems

Size: 1.7 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

Koziev/Translations

Parallel Literary Corpora: Fiction and Poetry Translations

Size: 17.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

machinelearningZH/zix_understandability-index

Get a pragmatic assessment how understandable a German text is.

Language: Jupyter Notebook - Size: 11 MB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

AndyTheFactory/romanian-nlp-datasets

A list of Romanian NLP Datasets

Size: 190 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 39 - Forks: 7

DwendwenHappy/Chumor

Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

Language: JavaScript - Size: 9.41 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

semnan-university-ai/persian-news-dataset 📦

Persian News Dataset

Size: 366 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

semnan-university-ai/persian-slang 📦

Persian Slang Words (dataset)

Size: 9.77 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

Dia-Bete/PersonaBasedCorpus

Repository for the LREC-COLING 2024 Paper: Persona-Based Corpus in the Diabetes Mellitus Domain – Applying a Human-Centered Approach to a Low-Resource Context

Size: 445 KB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

afrisenti-semeval/afrisent-semeval-2023

AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/

Language: Jupyter Notebook - Size: 33 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 38 - Forks: 38

amazon-science/webie

Dataset for web-scaled information extraction.

Language: Python - Size: 216 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 1

K-RLange/SpeakGer

A meta enriched data set of German parliamental debates covering 74 years of plenary protocols.

Language: Python - Size: 1.67 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

semnan-university-ai/persian-sms-dataset 📦

Persian sms dataset

Size: 103 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

Related Keywords

nlp-dataset 13 nlp 8 dataset 4 persian 3 datasets 2 persian-dataset 2 nlp-datasets 2 nlp-resources 2 russian-language 2 sms-dataset 1 opinion-mining 1 low-resource-nlp 1 low-resouce-language 1 low-resolution-data 1 africanlp 1 african-languages 1 text-simpli 1 personas 1 nlproc 1 nlp-text-simplification 1 nlp-machine-learning 1 nlg-dataset 1 lrec-coling-2024 1 farsi 1 sms 1 persian-sms 1 parliamentary-data 1 german 1 relation-extraction 1 information-extraction 1 entity-extraction 1 twitter-sentiment-analysis 1 twitter 1 twitt 1 shared-tasks 1 shared-task 1 sentiment-classification 1 sentiment-analysis 1 sentiment 1 semeval2023 1 semeval-sentiment 1 farsiswear 1 farsiswearword 1 persiandataset 1 persianswearword 1 swear 1 sweardataset 1 swearword 1 evaluation 1 poetry 1 russian-language-nlp 1 machine-translation 1 multilingual-dataset 1 cefr-prediction 1 llms 1 machine-learning 1 natural-language-processing 1 nlp-library 1 python 1 spacy 1 textdescriptives 1 understandability 1 nlp-data 1 romanian 1 romanian-language 1 chinese-dataset 1 humor 1 text-dataset 1 persian-slang 1 persian-slang-dataset 1 slang-word 1 word-dataset 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos