An open API service providing repository metadata for many open source software ecosystems.

Topic: "nlp-datasets"

mihail911/nlp-library

curated collection of papers for the nlp practitioner 📖👩‍🔬

Size: 63.5 KB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 1,075 - Forks: 91

hellohaptik/multi-task-NLP

multi_task_NLP is a utility toolkit enabling NLP developers to easily train and infer a single model for multiple tasks.

Language: Python - Size: 7.46 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 358 - Forks: 54

dkulagin/kartaslov

Открытые лингвистические датасеты: тональный словарь русского языка КартаСловСент, датасет по семантике, ассоциативный граф и датасет по орфографическим ошибкам и опечаткам.

Size: 20.1 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 346 - Forks: 50

quincyliang/nlp-public-dataset

Chinese, English NER, English-Chinese machine translation dataset. 中英文实体识别数据集,中英文机器翻译数据集, 中文分词数据集

Language: Python - Size: 12.9 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 320 - Forks: 75

guhhhhaa/4675-scifi

chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料

Size: 113 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 277 - Forks: 50

irfnrdh/Awesome-Indonesia-NLP

Resource NLP & Bahasa

Size: 52.7 KB - Last synced at: 4 days ago - Pushed at: over 5 years ago - Stars: 269 - Forks: 67

grammarly/ua-gec

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Language: Macaulay2 - Size: 18 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 261 - Forks: 22

StonyBrookNLP/appworld

🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Paper.

Language: Python - Size: 5.16 MB - Last synced at: 22 days ago - Pushed at: about 1 month ago - Stars: 201 - Forks: 19

liutiedong/goat

a Fine-tuned LLaMA that is Good at Arithmetic Tasks

Language: Jupyter Notebook - Size: 863 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 177 - Forks: 17

cjiang2/VDCNN

Implementation of Very Deep Convolutional Neural Network for Text Classification

Language: Python - Size: 42 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 171 - Forks: 40

INK-USC/TriggerNER

TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)

Language: Python - Size: 2.22 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 170 - Forks: 19

INK-USC/CommonGen

A Constrained Text Generation Challenge Towards Generative Commonsense Reasoning

Language: Python - Size: 107 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 136 - Forks: 23

xtea/chinese_medical_words

手工整理医疗行业词汇、术语等语料。可用于语音识别、对话系统等各类nlp模型训练。

Size: 1.33 MB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 85 - Forks: 31

Niger-Volta-LTI/yoruba-text

Yorùbá language training text for NLP, ASR and TTS tasks

Language: Python - Size: 76.2 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 76 - Forks: 26

Pzoom522/HistSumm

Code and data for "Summarising Historical Text in Modern Languages" (EACL 2021)

Language: Jupyter Notebook - Size: 237 KB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 72 - Forks: 9

kelvin-jiang/FreebaseQA

The release of the FreebaseQA data set (NAACL 2019).

Size: 7.8 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 59 - Forks: 1

fido-ai/ua-datasets

A collection of datasets for Ukrainian language

Language: Python - Size: 2.08 MB - Last synced at: 2 days ago - Pushed at: 11 months ago - Stars: 57 - Forks: 2

gcunhase/AMICorpusXML

Extracts Transcript and Summary (Abstractive and Extractive) from the AMI Meeting Corpus

Language: Python - Size: 9.48 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 52 - Forks: 29

selimfirat/bilkent-turkish-writings-dataset

Compilation of Turkish writings dataset that promotes creativity, content, composition, grammar, spelling and punctuation.

Language: Python - Size: 41.3 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 50 - Forks: 2

secsilm/zi-dataset

汉字数据集,包括汉字的相关信息,例如笔画数、部首、拼音、英文释义/同义词等。

Size: 1.57 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 50 - Forks: 8

guhhhhaa/wula-scifi

chinese NLP corpus of chinese science fiction, chinese science fiction corpus: Archive of the Ark Plan of Ula Science Fiction Website 乌拉科幻小说网方舟计划存档,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料

Size: 199 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 49 - Forks: 9

AndyTheFactory/romanian-nlp-datasets

A list of Romanian NLP Datasets

Size: 215 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 48 - Forks: 8

matt-seb-ho/WikiWhy

WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000+ "why" question-answer-rationale triplets.

Language: Python - Size: 28.2 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 41 - Forks: 1

afrisenti-semeval/afrisent-semeval-2023

AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/

Language: Jupyter Notebook - Size: 33 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 38 - Forks: 38

gkiril/benchie

Comprehensive evaluation framework for Open Information Extraction.

Language: Python - Size: 340 KB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 38 - Forks: 8

uma-pi1/OPIEC

Reading the data from OPIEC - an Open Information Extraction corpus

Language: Java - Size: 237 KB - Last synced at: 7 days ago - Pushed at: about 6 years ago - Stars: 37 - Forks: 6

bothub-it/bothub

Bothub is an open platform for predicting, training and sharing NLP datasets in multiple languages

Language: Makefile - Size: 1.17 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 35 - Forks: 5

gpt-tester/ChatGPT-test-dataset-01

a small test dataset for use with OpenAI's ChatGPT

Size: 47.9 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 34 - Forks: 11

ElizaLo/Question-Answering-based-on-SQuAD Fork of gauthierdmn/question_answering

Question Answering System using BiDAF Model on SQuAD v2.0

Language: Python - Size: 7.27 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 25 - Forks: 27

cybermatt/russian-names

Library for generation of russian names

Language: Python - Size: 628 KB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 24 - Forks: 2

INK-USC/XCSR

Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Language: Python - Size: 60.7 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 2

utahnlp/infotabs-code

Implementation of the semi-structured inference model in our ACL 2020 paper, INFOTABS: Inference on Tables as Semi-structured Data.

Language: Python - Size: 127 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 7

JadynHax/scpscraper

A Python library designed for scraping data from the SCP wiki.

Language: Python - Size: 216 KB - Last synced at: 26 days ago - Pushed at: over 4 years ago - Stars: 15 - Forks: 4

maxent-ai/Datasets 📦

datasets with text data for use in NLP, Text analysis, information extraction, ML research.

Language: Jupyter Notebook - Size: 45.7 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 15 - Forks: 3

aajanki/finnish-nlp-datasets

Open Finnish NLP datasets

Size: 30.3 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 14 - Forks: 1

jamesohortle/loanwords_gairaigo

English loanwords in Japanese

Language: Python - Size: 17 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 1

uma-pi1/OPIEC-pipeline

Language: Java - Size: 59.3 MB - Last synced at: 7 days ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 2

MiniXC/opensubtitles-dataloader

Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.

Language: Python - Size: 26.4 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 13 - Forks: 2

trisongz/pylines

Simplifying parsing of large jsonline files in NLP Workflows

Language: Python - Size: 244 KB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 12 - Forks: 1

aryashah2k/SASBitathon-WinningSolution

1st Place solution for the SAS | GIM Bitathon, an annual Data Science Hackathon organized by SAS and Goa Institute of Management. The dataset worked on is the subset of the consumer complaints database provided by www.consumerfinance.gov

Language: Jupyter Notebook - Size: 39.6 MB - Last synced at: 5 days ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 1

SemiringInc/Mueller-Report-Corpus

The Mueller Report Corpus V 0.1

Size: 3.51 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 11 - Forks: 0

divkakwani/webcorpus

Generate large textual corpora for almost any language by crawling the web

Language: Python - Size: 44.9 MB - Last synced at: 18 days ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 11

mnschmit/SherLIiC

A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference

Language: Python - Size: 20.8 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 8 - Forks: 1

INK-USC/RiddleSense

RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge

Language: Python - Size: 16.3 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 1

mtala3t/Identify-the-Sentiments-AV-NLP-Contest

This project is submitted as python implementation in the contest of Analytics Vidhya called "Identify the Sentiments". I enjoyed the joining of this competition and all its process. This submited solution got the rank 118 in the public leaderboard.

Language: Python - Size: 7.61 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 7 - Forks: 2

marco-roberti/pytorch-e2e-dataset

The E2E Dataset, packed as a PyTorch DataSet subclass

Language: Python - Size: 97.7 KB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 7 - Forks: 0

Dibyakanti/AutoTNLI-code

This repository contains the official code for the paper : Realistic Data Augmentation Framework for Enhancing Tabular Reasoning.

Language: HTML - Size: 3.99 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 6 - Forks: 1

LIAAD/PT-Pump-Up

Hub for the Portuguese language NLP Resources

Language: PHP - Size: 8.37 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

mehrdad-dev/Battle-of-the-Wordsmiths

Official github repository: Battle of the Wordsmiths: Comparing ChatGPT, GPT-4, Claude, and Bard (dataset)

Language: Python - Size: 614 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 1

JasonShao55/Chinese_Metaphor_Explanation

An annotated Chinese metaphor dataset

Language: Python - Size: 71.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 0

fzehracetin/turkish-question-answering

We extracted 5,000 question-answer pairs from Turkish Wikipedia and fine-tuned Turkish BERT, ALBERT, ELECTRA for the question-answering task.

Language: Jupyter Notebook - Size: 1.52 MB - Last synced at: 12 months ago - Pushed at: almost 4 years ago - Stars: 6 - Forks: 1

kushalchauhan98/ticket-segmentation

Data for the ACL 2020 paper - Improving Segmentation for Technical Support Problems

Size: 1.22 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 2

gcunhase/ArXivAbsTitleDataset

Extract Abstract and Title Dataset from arXiv articles

Language: Python - Size: 14 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 0

griff4692/clin-sum

Analysis of Hospital-Course Summaries

Language: Python - Size: 336 KB - Last synced at: 4 months ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 1

bavard-ai/nlu-meta-dataset

A large dataset for learning to perform few-shot intent classification.

Size: 1.4 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 1

Bohdan-Khomtchouk/NERO-nlp

NERO-nlp is a PyPI package for biomedical Named Entity (Recognition) Ontology

Language: Python - Size: 29.9 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 1

navneetkrc/Flair_SOTA_NLP

Use of State of the Art FLAIR library for the NLP datasets

Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 4 - Forks: 0

dellison/NLIDatasets.jl

Julia interface to datasets for natural language inference

Language: Julia - Size: 48.8 KB - Last synced at: 12 days ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 0

Delta-Sigma/urdu-stopwords

A list containing Urdu stopwords.

Size: 25.4 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 11

StonyBrookNLP/appworld-leaderboard

🌍 Leaderboard Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL2024

Language: Python - Size: 127 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 3 - Forks: 1

Jpzinn654/qa-portuguese-v1

This is a split 500 thousands rows of a dataset from hugging face in portuguese to train NLP's for Question-and-Answering

Language: Python - Size: 4.88 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

poethan/AlphaMWE

AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations

Size: 265 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 2

d0rj/RusLit

📚 A small collection of Russian literature 📚

Size: 20.7 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 2

ArmanBehnam/NLP

Natural language processing including Datasets,Farsi NLP, Automated Essay Scoring, Automatic Speech Recognition and etc.

Language: Jupyter Notebook - Size: 512 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 0

jrgpulido/js19is2e

Language: TeX - Size: 27.5 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 10

U-11-Agar/timeseries-analysis

time series data analysis on real time data and csv files

Language: Jupyter Notebook - Size: 73 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

turkish-nlp-suite/Vitamins-Supplements-NER-dataset

Repo for Turkish Vitamins and Supplements NER dataset.

Size: 558 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 1

MiSaengg/gunhee-RnD-space

R&D for datasets for book genres

Language: Jupyter Notebook - Size: 17.1 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

PranavNV/Nationality-Prejudice-in-Text-Generation

This project focuses on the analysis of text generation models such as GPT-2 to identify and understand populistic behaviors or biases against various nationality.

Size: 20.1 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

mzhukovaucsb/emoji_gestures

Research project “Gesture Emoji Twitter Corpus”. Project description, data collection pipeline (tweepy), data preprocessing functions (regex, nltk), 2 datasets for Russian and English published in open access.

Language: Jupyter Notebook - Size: 125 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

DravidianNLP/Datasets

This repository hosts all the datasets published in Dravidian Languages.

Size: 11.7 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

theQuert/NLP-sentiment-analysis

Sentiment Analysis models with multiple algorithms

Language: Jupyter Notebook - Size: 5.63 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 2

vgupta123/infotabs-code Fork of utahnlp/infotabs-code

Implementation of the semi-structured inference model in our ACL 2020 paper. INFOTABS: Inference on Tables as Semi-structured Data

Language: Python - Size: 138 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 0

jrgpulido/pd18is5d

Language: Roff - Size: 11.1 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 13

BigToothDev/pet-project-nlp

Natural language processing pet project. It includes data web scraping, lemmatizing, stemming, and working with related words (hyponyms, hypernyms, meronyms, holonyms). This specific code gathers all data from chosen pages of the Suspilne (Суспільне) webpage. Next, the data is manipulated and processed for future analysis

Language: Python - Size: 48.5 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

Robert-Morabito/STOP

Repository for the paper STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions (EMNLP 2024)

Language: Python - Size: 375 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

christosojan/MSA_in_Indian_Languages

Implementation of Dense Fusion Network with Multimodal Residual (DFMR) for Multi-modal Sentiment Analysis(MSA) in native Indian Languages like Malayalam by integrating Multi-modal information from Multimedia. The model processes the textual, visual, and auditory modalities of the video to classify the sentiment into five categories.

Language: Jupyter Notebook - Size: 31.3 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

hicte/moin

A dataset of Moin Persian 🇮🇷 dictionary 📖 words.

Size: 265 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Yu-billie/NLP-Project-CUAI-1H23

NLP Projects in CUAI 1H23

Language: Jupyter Notebook - Size: 3.09 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

HuynhXuanLam-IT44/BERT-Covid-Sentiment-Classification

Applying and Understanding an Advanced, Novel Deep Learning Approach

Language: Jupyter Notebook - Size: 2.55 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

aman-17/BERT-Semantic-Similarity-Flask-App

Flask app for Semantic Similarity of sentences using BERT model.

Language: CSS - Size: 6.06 MB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

yarakyrychenko/tg-misinfo-data

Telegram posts from Russian news, misinformation, and propaganda channels made during the first weeks of the 2022 Russian invasion of Ukraine.

Size: 37.1 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

sammitjain/loksabha-questions

Questions asked in the Lok Sabha - collection and analysis of trends. Creating the dataset from scratch.

Language: Jupyter Notebook - Size: 80.7 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 1

Utkichaps/AMICorpus-Meeting-Transcript-Extraction

This can be used to convert the AMI corpus meeting transcripts to a speaker-by-speaker dialogue discourse conversation for each meeting.

Language: Python - Size: 243 KB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

cedspam/text_dataset_streaming

Language: Python - Size: 72.3 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

tagtog/BBC-News-Dataset

🍃BBC-News-Dataset in anndoc (tagtog) format

Language: HTML - Size: 2.81 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

DARK-art108/FinBox-NLP-Exercise

An NLP Exercise

Language: Jupyter Notebook - Size: 91.8 KB - Last synced at: 4 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

Text-Mining/Ferdowsi-Annotated-Academic-Linguistic-Corpus

دو پیکره زبانی مربوط به مجموعه مقالات دانشگاه فردوسی مشهد

Size: 57.6 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 1

praveentn/nlpaeg

Natural Language Processing for Artificial Error Generation

Language: Python - Size: 6.35 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

vaibhav0000patel/Topical-Sentiment-Analysis

ML model that recognizes how much the text is related to data of a particular topic which the model is trained with. Modular structure of the code makes it easier to understand and modify it. Here, the model classify the text if it is crime related or not..

Language: Python - Size: 483 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

nelson888/seq2seq-data-augmentation

Soft Contextual Data Augmentation, a Data Augmentation method for NLP translation datasets

Language: Java - Size: 41 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

language-resources-nepal/language-resources-nepal.github.io

A curated collection of language resources for Nepal

Language: SCSS - Size: 87.9 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

Selium98/Flix-Master

Advance Movie Recommender, with Flask as the framework used for User Interface, deployed on Heroku.

Language: HTML - Size: 1.11 MB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

Karan-Malik/WordEmbeddings

Creating Word Embeddings using Keras

Language: Jupyter Notebook - Size: 24.5 MB - Last synced at: 4 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

rareloto/beginnerwebscraping-naverdictionary

Scraping Korean - English conversations parallel text pairs from Naver Conversation of the Day

Language: Jupyter Notebook - Size: 682 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

Shayokh144/Bengali-Literature-Data-Collection

Size: 943 KB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

roshangrewal/natural-language-processing

Natural language processing is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human languages, in particular how to program computers to process and analyze large amounts of natural language data.

Language: Jupyter Notebook - Size: 7.81 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

navanith007/ULMFit-using-pytorch

This repository contains solving of NLP problems using transfer learning

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

saxenaprerit/Text_mining_using_NLTK

This code uses NTLK for text mining in python

Language: Jupyter Notebook - Size: 14.6 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

clulab/edin-data

Biomolecular events mined by Reach from PubMed Central

Size: 3.29 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 1

Related Topics
nlp 82 nlp-machine-learning 37 natural-language-processing 26 dataset 24 nlp-resources 20 python 17 machine-learning 15 python3 12 datasets 12 deep-learning 11 corpus 9 nlp-library 9 sentiment-analysis 8 data-science 8 pytorch 7 corpus-data 7 bert 7 natural-language-understanding 7 sentiment-classification 7 question-answering 6 text-classification 6 wikipedia 5 corpus-linguistics 5 bert-model 5 information-extraction 5 llm 5 nli 4 transformer 4 nlp-keywords-extraction 4 data 4 chinese-nlp 4 sp-es 4 text-processing 4 nltk 4 language 3 tables 3 keras 3 roberta 3 acl2020 3 ai 3 nlg-dataset 3 natural-language-generation 3 deep-neural-networks 3 open-information-extraction 3 turkish-nlp 3 chatbot 3 chatgpt 3 turkce-veriseti 3 acl-2024 3 language-model 3 large-language-models 3 machinelearning 3 news 3 text-mining 3 corpus-tools 3 acl 3 flask 3 corpus-processing 3 nltk-python 3 interactive-coding 2 llm-agents 2 java 2 twitter 2 keras-tensorflow 2 tool-usage 2 data-visualization 2 tensorflow 2 data-cleaning 2 sentiment 2 pypi 2 nlp-dataset 2 nlp-apis 2 african-languages 2 natural-language-inference 2 linguistic-analysis 2 sklearn-library 2 transformers 2 dialogue 2 dataset-generation 2 india 2 wikipedia-corpus 2 wiki 2 named-entity-recognition 2 computational-linguistics 2 ai-agents 2 ai-apis 2 ai-assistants 2 artificial-intelligence 2 linguistics 2 ai-environment 2 ai-planning 2 autonomous-agents 2 scifi 2 neural-network 2 romanian 2 science-fiction 2 sklearn 2 stemming 2 portuguese-language 2 lstm-neural-networks 2