GitHub topics: nlp-datasets
yarakyrychenko/tg-misinfo-data
Telegram posts from Russian news, misinformation, and propaganda channels made during the first weeks of the 2022 Russian invasion of Ukraine.
Size: 37.1 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

d0rj/RusLit
📚 A small collection of Russian literature 📚
Size: 20.7 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 2

bavard-ai/nlu-meta-dataset
A large dataset for learning to perform few-shot intent classification.
Size: 1.4 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 1

MiniXC/opensubtitles-dataloader
Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.
Language: Python - Size: 26.4 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 13 - Forks: 2

rareloto/beginnerwebscraping-naverdictionary
Scraping Korean - English conversations parallel text pairs from Naver Conversation of the Day
Language: Jupyter Notebook - Size: 682 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

praveentn/nlpaeg
Natural Language Processing for Artificial Error Generation
Language: Python - Size: 6.35 MB - Last synced at: 6 days ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

kushalchauhan98/ticket-segmentation
Data for the ACL 2020 paper - Improving Segmentation for Technical Support Problems
Size: 1.22 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 2

language-resources-nepal/language-resources-nepal.github.io
A curated collection of language resources for Nepal
Language: SCSS - Size: 87.9 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

UlugbekSalaev/nlpproject
Uzbek NLP Project
Language: JavaScript - Size: 5.44 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

navanith007/ULMFit-using-pytorch
This repository contains solving of NLP problems using transfer learning
Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

SemiringInc/Mueller-Report-Corpus
The Mueller Report Corpus V 0.1
Size: 3.51 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 11 - Forks: 0

Utkichaps/AMICorpus-Meeting-Transcript-Extraction
This can be used to convert the AMI corpus meeting transcripts to a speaker-by-speaker dialogue discourse conversation for each meeting.
Language: Python - Size: 243 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

shikhirsingh/PersonNameRecognizer4j
Used to identify if the string contains a name of a human
Language: Java - Size: 22.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

marco-roberti/pytorch-e2e-dataset
The E2E Dataset, packed as a PyTorch DataSet subclass
Language: Python - Size: 97.7 KB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 7 - Forks: 0

mylee04/analysis-youtube-comment-krisandme
I tried to figure out positive and negative comments on my Youtube videos. So, I used NLP to analyze comments. I set the main language as Korean, but you can try setting English as the main language.
Language: Jupyter Notebook - Size: 154 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

DARK-art108/FinBox-NLP-Exercise
An NLP Exercise
Language: Jupyter Notebook - Size: 91.8 KB - Last synced at: 4 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

mnschmit/SherLIiC
A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference
Language: Python - Size: 20.8 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 8 - Forks: 1

Elysian01/Data-Purifier-Dataset
Data repository for Data Purifier examples
Size: 5.92 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

kvinne-anc/NLP-Meyers-Briggs-Project
Natural Language Processing: Meyers-Briggs personality type prediction based on analysis of social media posts.
Language: Jupyter Notebook - Size: 34.7 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

navneetkrc/Flair_SOTA_NLP
Use of State of the Art FLAIR library for the NLP datasets
Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 4 - Forks: 0

letuananh/texttaglib
a Python library for managing and annotating text corpuses in different formats.
Language: Python - Size: 278 KB - Last synced at: 5 days ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

tdude92/reddit-short-stories
4,308 short stories (4 million words) scraped from https://reddit.com/r/WritingPrompts
Size: 15 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

callumskeet/abc-fetch
Download recent news articles from the ABC's API.
Language: JavaScript - Size: 49.8 KB - Last synced at: 25 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

arquicanedo/shaku
Document labeling from your terminal.
Language: HTML - Size: 754 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

murali1996/eacl2021-OffensEval-Dravidian
EACL 2021 paper (SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification)
Language: Python - Size: 237 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 3

theQuert/NLP-sentiment-analysis
Sentiment Analysis models with multiple algorithms
Language: Jupyter Notebook - Size: 5.63 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 2

Selium98/Flix-Master
Advance Movie Recommender, with Flask as the framework used for User Interface, deployed on Heroku.
Language: HTML - Size: 1.11 MB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

Shayokh144/Bengali-Literature-Data-Collection
Size: 943 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

roshangrewal/natural-language-processing
Natural language processing is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human languages, in particular how to program computers to process and analyze large amounts of natural language data.
Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

vgupta123/infotabs-code Fork of utahnlp/infotabs-code
Implementation of the semi-structured inference model in our ACL 2020 paper. INFOTABS: Inference on Tables as Semi-structured Data
Language: Python - Size: 138 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 0

saxenaprerit/Text_mining_using_NLTK
This code uses NTLK for text mining in python
Language: Jupyter Notebook - Size: 14.6 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

clulab/edin-data
Biomolecular events mined by Reach from PubMed Central
Size: 3.29 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 1

Mlawrence95/moby-dick
[dataset] The full Moby Dick text, cleaned and formatted
Size: 916 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

yuboona/people_paper_2018-2019
a dataset preprocessed of people paper.
Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

vishnuchilamakuru/coursera-reviews-analsis
Language: Jupyter Notebook - Size: 14.5 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

jrgpulido/js19is2e
Language: TeX - Size: 27.5 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 10

prateeksawhney97/NLP-Pipeline-to-Clean-Movie-Reviews-Data
Creating a NLP Pipeline to 'Clean' Movie Reviews Data and writing cleaned data to output file
Language: Jupyter Notebook - Size: 2.93 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

nelson888/seq2seq-data-augmentation
Soft Contextual Data Augmentation, a Data Augmentation method for NLP translation datasets
Language: Java - Size: 41 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

jrgpulido/js19is2d
Language: JavaScript - Size: 94.2 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 17

vishnu9810/Sentiment-Analysis-on-Movie-Reviews
Sentiment analysis on nltk movie reviews data set using Naive Bayes Classifier achieving more than 93% accuracy
Language: Python - Size: 33.4 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 1

jrgpulido/pd18is5e
Language: Roff - Size: 21.9 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 11

jbdatascience/nlp-datasets Fork of niderhoff/nlp-datasets
Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)
Size: 29.3 KB - Last synced at: about 14 hours ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

nikitaeverywhere/news-articles-dataset
A dataset of 2095 plain text articles of 5 categories with over 805k words in total.
Size: 2.26 MB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1
