An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: nlp-datasets

yarakyrychenko/tg-misinfo-data

Telegram posts from Russian news, misinformation, and propaganda channels made during the first weeks of the 2022 Russian invasion of Ukraine.

Size: 37.1 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

d0rj/RusLit

📚 A small collection of Russian literature 📚

Size: 20.7 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 2

bavard-ai/nlu-meta-dataset

A large dataset for learning to perform few-shot intent classification.

Size: 1.4 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 1

MiniXC/opensubtitles-dataloader

Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.

Language: Python - Size: 26.4 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 13 - Forks: 2

rareloto/beginnerwebscraping-naverdictionary

Scraping Korean - English conversations parallel text pairs from Naver Conversation of the Day

Language: Jupyter Notebook - Size: 682 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

praveentn/nlpaeg

Natural Language Processing for Artificial Error Generation

Language: Python - Size: 6.35 MB - Last synced at: 6 days ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

kushalchauhan98/ticket-segmentation

Data for the ACL 2020 paper - Improving Segmentation for Technical Support Problems

Size: 1.22 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 2

language-resources-nepal/language-resources-nepal.github.io

A curated collection of language resources for Nepal

Language: SCSS - Size: 87.9 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

UlugbekSalaev/nlpproject

Uzbek NLP Project

Language: JavaScript - Size: 5.44 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

navanith007/ULMFit-using-pytorch

This repository contains solving of NLP problems using transfer learning

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

SemiringInc/Mueller-Report-Corpus

The Mueller Report Corpus V 0.1

Size: 3.51 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 11 - Forks: 0

Utkichaps/AMICorpus-Meeting-Transcript-Extraction

This can be used to convert the AMI corpus meeting transcripts to a speaker-by-speaker dialogue discourse conversation for each meeting.

Language: Python - Size: 243 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

shikhirsingh/PersonNameRecognizer4j

Used to identify if the string contains a name of a human

Language: Java - Size: 22.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

marco-roberti/pytorch-e2e-dataset

The E2E Dataset, packed as a PyTorch DataSet subclass

Language: Python - Size: 97.7 KB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 7 - Forks: 0

mylee04/analysis-youtube-comment-krisandme

I tried to figure out positive and negative comments on my Youtube videos. So, I used NLP to analyze comments. I set the main language as Korean, but you can try setting English as the main language.

Language: Jupyter Notebook - Size: 154 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

DARK-art108/FinBox-NLP-Exercise

An NLP Exercise

Language: Jupyter Notebook - Size: 91.8 KB - Last synced at: 4 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

mnschmit/SherLIiC

A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference

Language: Python - Size: 20.8 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 8 - Forks: 1

Elysian01/Data-Purifier-Dataset

Data repository for Data Purifier examples

Size: 5.92 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

kvinne-anc/NLP-Meyers-Briggs-Project

Natural Language Processing: Meyers-Briggs personality type prediction based on analysis of social media posts.

Language: Jupyter Notebook - Size: 34.7 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

navneetkrc/Flair_SOTA_NLP

Use of State of the Art FLAIR library for the NLP datasets

Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 4 - Forks: 0

letuananh/texttaglib

a Python library for managing and annotating text corpuses in different formats.

Language: Python - Size: 278 KB - Last synced at: 5 days ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

tdude92/reddit-short-stories

4,308 short stories (4 million words) scraped from https://reddit.com/r/WritingPrompts

Size: 15 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

callumskeet/abc-fetch

Download recent news articles from the ABC's API.

Language: JavaScript - Size: 49.8 KB - Last synced at: 25 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

arquicanedo/shaku

Document labeling from your terminal.

Language: HTML - Size: 754 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

murali1996/eacl2021-OffensEval-Dravidian

EACL 2021 paper (SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification)

Language: Python - Size: 237 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 3

theQuert/NLP-sentiment-analysis

Sentiment Analysis models with multiple algorithms

Language: Jupyter Notebook - Size: 5.63 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 2

Selium98/Flix-Master

Advance Movie Recommender, with Flask as the framework used for User Interface, deployed on Heroku.

Language: HTML - Size: 1.11 MB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

Shayokh144/Bengali-Literature-Data-Collection

Size: 943 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

roshangrewal/natural-language-processing

Natural language processing is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human languages, in particular how to program computers to process and analyze large amounts of natural language data.

Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

vgupta123/infotabs-code Fork of utahnlp/infotabs-code

Implementation of the semi-structured inference model in our ACL 2020 paper. INFOTABS: Inference on Tables as Semi-structured Data

Language: Python - Size: 138 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 0

saxenaprerit/Text_mining_using_NLTK

This code uses NTLK for text mining in python

Language: Jupyter Notebook - Size: 14.6 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

clulab/edin-data

Biomolecular events mined by Reach from PubMed Central

Size: 3.29 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 1

Mlawrence95/moby-dick

[dataset] The full Moby Dick text, cleaned and formatted

Size: 916 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

yuboona/people_paper_2018-2019

a dataset preprocessed of people paper.

Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

vishnuchilamakuru/coursera-reviews-analsis

Language: Jupyter Notebook - Size: 14.5 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

jrgpulido/js19is2e

Language: TeX - Size: 27.5 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 10

prateeksawhney97/NLP-Pipeline-to-Clean-Movie-Reviews-Data

Creating a NLP Pipeline to 'Clean' Movie Reviews Data and writing cleaned data to output file

Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

nelson888/seq2seq-data-augmentation

Soft Contextual Data Augmentation, a Data Augmentation method for NLP translation datasets

Language: Java - Size: 41 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

jrgpulido/js19is2d

Language: JavaScript - Size: 94.2 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 17

vishnu9810/Sentiment-Analysis-on-Movie-Reviews

Sentiment analysis on nltk movie reviews data set using Naive Bayes Classifier achieving more than 93% accuracy

Language: Python - Size: 33.4 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 1

jrgpulido/pd18is5e

Language: Roff - Size: 21.9 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 11

jbdatascience/nlp-datasets Fork of niderhoff/nlp-datasets

Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)

Size: 29.3 KB - Last synced at: about 14 hours ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

nikitaeverywhere/news-articles-dataset

A dataset of 2095 plain text articles of 5 categories with over 805k words in total.

Size: 2.26 MB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

Related Keywords
nlp-datasets 143 nlp 83 nlp-machine-learning 37 natural-language-processing 26 dataset 24 nlp-resources 20 python 17 machine-learning 15 datasets 13 python3 12 deep-learning 11 corpus 9 nlp-library 9 data-science 8 sentiment-analysis 8 bert 7 pytorch 7 sentiment-classification 7 corpus-data 7 natural-language-understanding 7 question-answering 6 text-classification 6 wikipedia 5 corpus-linguistics 5 information-extraction 5 bert-model 5 llm 5 data 4 nlp-keywords-extraction 4 chinese-nlp 4 text-processing 4 sp-es 4 transformer 4 nli 4 nltk 4 machinelearning 3 turkish-nlp 3 roberta 3 news 3 flask 3 ai 3 natural-language-generation 3 keras 3 nltk-python 3 acl2020 3 turkce-veriseti 3 language 3 language-model 3 large-language-models 3 acl 3 text-mining 3 tables 3 acl-2024 3 corpus-tools 3 corpus-processing 3 deep-neural-networks 3 nlg-dataset 3 chatgpt 3 open-information-extraction 3 chatbot 3 pypi 2 bert-embeddings 2 gpt 2 linguistics 2 java 2 naive-bayes-classifier 2 semi-structured-data 2 rust 2 openai 2 sentiment-analysis-dataset 2 dataset-generation 2 medical-nlp 2 chatgpt-api 2 artificial-intelligence 2 nlp-parsing 2 gpt-3 2 linguistic-analysis 2 infotabs 2 challenge 2 inference 2 lstm-neural-networks 2 india 2 text-generation 2 twitter 2 database 2 sentiment 2 tensorflow 2 preprocessing 2 keras-tensorflow 2 jupyter-notebook 2 natural-language-inference 2 dravidian-languages 2 stopwords 2 portuguese-language 2 turkce-nlp 2 turkish-nlp-dataset 2 llms 2 intent-classification 2 language-learning 2 named-entity-recognition 2