An open API service providing repository metadata for many open source software ecosystems.

Topic: "low-resource-nlp"

adbar/simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Language: Python - Size: 729 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 155 - Forks: 12

csebuetnlp/banglanmt

This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.

Language: Python - Size: 2.05 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 144 - Forks: 45

cisnlp/GlotLID

💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023

Language: Python - Size: 409 KB - Last synced at: 10 days ago - Pushed at: 6 months ago - Stars: 128 - Forks: 8

ljvmiranda921/calamanCy

NLP pipelines for Tagalog using spaCy

Language: Python - Size: 978 KB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 54 - Forks: 2

231sm/Reasoning_In_EE

Code and datasets for the ACL 2021 paper "OntoED: Low-resource Event Detection with Ontology Embedding"

Language: Python - Size: 60.5 MB - Last synced at: 9 days ago - Pushed at: about 3 years ago - Stars: 45 - Forks: 16

zjunlp/RAP

[SIGIR 2023] Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction

Language: Python - Size: 17.1 MB - Last synced at: 10 months ago - Pushed at: about 2 years ago - Stars: 39 - Forks: 3

afrisenti-semeval/afrisent-semeval-2023

AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/

Language: Jupyter Notebook - Size: 33 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 38 - Forks: 38

KennethEnevoldsen/scandinavian-embedding-benchmark

A Scandinavian Benchmark for sentence embeddings

Language: Python - Size: 4.82 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 37 - Forks: 5

hausanlp/NaijaSenti Fork of shmuhammadd/NaijaSenti

This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, Hausa, Yoruba and Pidgin.

Size: 29.7 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 32 - Forks: 24

NLP-Tutorials/AACL-IJCNLP2022-KGC-Tutorial

Materials for AACL-IJCNLP-2022 tutorial: Efficient and Robust Knowledge Graph Construction

Size: 31 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 25 - Forks: 7

luciusssss/ZhuangBench

[ACL'24 Findings] Teaching Large Language Models an Unseen Language on the Fly

Language: Python - Size: 3.24 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 21 - Forks: 0

luciusssss/mc2_corpus

[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)

Language: Python - Size: 602 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 20 - Forks: 2

nicolay-r/awesome-sentiment-attitude-extraction

A curated list of awesome sentiment analysis studies, in which attitude corresponds to the text position conveyed by Subject towards other Object mentioned in text such as: entities, events, etc.

Size: 1.46 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 19 - Forks: 1

StefanHeng/ProgGen

Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"

Language: Python - Size: 62.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 17 - Forks: 2

ijazul-haq/nlpashto

Pashto Natural Language Processing Toolkit

Size: 62.8 MB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 0

wannaphong/Awesome-Lao-NLP

Awesome Lao Natural Language Processing

Size: 11.7 KB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 12 - Forks: 0

csebuetnlp/banglaparaphrase

This repository contains the code, data, and associated models of the paper titled "BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset", accepted in Proceedings of the Asia-Pacific Chapter of the Association for Computational Linguistics: AACL 2022.

Language: Python - Size: 101 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

AsifulNobel/Metsys

Chatbot Solution for Resource-Poor Languages. Contains code and data for Journal Article 'Focused domain contextual AI chatbot framework for resource poor languages'.

Language: Python - Size: 56.1 MB - Last synced at: 6 months ago - Pushed at: almost 4 years ago - Stars: 9 - Forks: 6

nicolay-r/RuSentRel-Leaderboard

This is an official Leaderboard for the RuSentRel-1.1 dataset originally described in paper (arxiv:1808.08932)

Language: Python - Size: 1.88 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

ruoyuxie/noisy_parallel_data_alignment

Enhanced awesome-align for low-resource languages and noise simulation: https://arxiv.org/abs/2301.09685

Language: Python - Size: 245 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 1

Lhtie/Bio-Domain-Transfer

Implementation of NAACL 2024 main conference paper: Named Entity Recognition Under Domain Shift via Metric Learning for Life Science

Language: Python - Size: 19.2 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 4 - Forks: 0

zjunlp/OntoED Fork of 231sm/Reasoning_In_EE

Code and datasets for the ACL 2021 paper "OntoED: Low-resource Event Detection with Ontology Embedding"

Size: 60.5 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 0

devrimcavusoglu/nonwestlit

NONWESTLIT Project Codebase

Language: Python - Size: 239 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

galax19ksh/Manipuri-NLP

A comprehensive overview of research regarding Natural Language Processing (NLP) of Manipuri language.

Size: 214 KB - Last synced at: 14 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

EagleW/Chem-FINESE

Official implementation of the EACL Findings 2024 paper: Chem-FINESE: Validating Fine-Grained Few-shot Entity Extraction through Text Reconstruction

Language: Python - Size: 1.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

cisnlp/GlotStoryBook

Children StoryBooks for 180 langauges.

Language: Jupyter Notebook - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

vgupta123/contextualize_scdv

Unsupervised Contextualized Document Representation, to appear in SustaiNLP 2021 EMNLP 2021

Language: Python - Size: 5.42 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

Rui0828/Learning-From-Mistakes-Prompting

LoResMT@ACL 2024: Learning-From-Mistakes Prompting for Indigenous Language Translation – A feedback-driven approach to enhance low-resource translation.

Language: Python - Size: 4.92 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

pnborchert/MultiRep

Efficient Information Extraction in Few-Shot Relation Classification through Contrastive Representation Learning. NAACL 2024.

Language: Python - Size: 2.55 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

hausanlp/hausanlp

Hausa Natural Language Processing Repository

Size: 261 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

HenningBuhl/low-resource-machine-translation

This repository is an open-source colleciton of various low-resource machine translation experiments.

Language: Python - Size: 428 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 1

hanjiale/REPaperList

Must-read papers on relation extraction.

Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

SaraikiNLP/SaraikiNLP

SaraikiNLP | Natural Language Processing for Saraiki Language | NLP Toolkit | Saraiki NLP

Language: Jupyter Notebook - Size: 304 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 1 - Forks: 0

kasunw22/sinhala-word-embedding-alignment

English-Sinhala multilingual word embedding alignment resources

Language: Jupyter Notebook - Size: 40.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

chschroeder/self-training-for-sample-efficient-active-learning

Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models (EMNLP 2024)

Language: Python - Size: 0 Bytes - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

GGLAB-KU/turkish-plu

Code for AACL23 paper "Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish"

Language: Python - Size: 146 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

mdm-code/manx

Fine-tune LLM for early Middle English lemmatization with data from LAEME.

Language: Python - Size: 157 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

kalindasiaminwe/ChitongaASR

A natural language processing and machine learning project for a low resource langauge in Zambia.

Language: Jupyter Notebook - Size: 548 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

wenlai-lavine/lavine_blog

Wen Lai's Blog related to MT/NLP/ML

Language: HTML - Size: 4.88 MB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

kanincityy/misogyny_detection_transformers

Building an Effective Misogyny Detection Classifier for Low-Resource Languages

Size: 15.1 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

holatung/Kumlinda

Fortifying Community Truth: Developing Lexicons on Tech Facilitated GBV and Model for Low-resourced African Languages ( Hausa, Igbo, Yoruba and Swahili)

Size: 57.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mohammad-safari/Conditional_Comment_Generation_Using_TILGAN Fork of shizhediao/TILGAN

Conditional comment generation using TILGAN As Batchelor Project, Based On Findings of ACL-IJCNLP 2021 paper entitled "TILGAN: Transformer-based Implicit Latent GAN for Diverse and Coherent Text Generation"

Language: Python - Size: 79.6 MB - Last synced at: 6 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

nicolay-r/RuSentNE-LLM-Benchmark

This repository highlights the LLMs reasoning capabilities in Targeted Sentiment Analysis in Russian 📊

Language: Python - Size: 7.08 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

dialect-ai/BenHeadGen

This is the official repository contains the code, data, and models of the paper titled "Shironaam: Bengali News Headline Generation using Auxiliary Information", accepted for publication in Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL’23), May 2-6, 2023.

Size: 25.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

mariaviana21/tars_flair

An overview of the possibilities of using TARS models for low language resources

Language: Jupyter Notebook - Size: 348 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

mosh98/Text_Aug_Low_Res

AAAI Knowledge NLP Submission

Language: Jupyter Notebook - Size: 104 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

hellomasaya/hawp

This repository provides HAWP: a dataset for Hindi Word Problem Solving and a baseline (LREC 2022)

Size: 174 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

Related Topics
low-resource-languages 23 nlp 21 natural-language-processing 11 information-extraction 7 low-resource 7 dataset 6 few-shot-learning 5 machine-learning 5 deep-learning 5 relation-extraction 5 sentiment-analysis 5 few-shot 5 pytorch 4 machine-translation 4 multilingual 4 nlp-machine-learning 4 named-entity-recognition 3 hausa 3 transformers 3 llms 3 sentiment-classification 3 large-language-models 3 bangla-nlp 3 african-languages 3 neural-network 3 llm 2 datasets 2 leaderboard 2 chatgpt 2 hausa-nlp 2 corpus 2 tokenization 2 python 2 language-identification 2 sentiment 2 neural-networks 2 language-detection 2 neural-machine-translation 2 low-resource-machine-translation 2 glot 2 ontoed 2 awesome-list 2 language-model 2 classification 2 kg 2 knowledge-graph 2 ner 2 lemmatizer 2 lemmatization 2 igbo 2 triple-extraction 2 contrastive-learning 2 event-extraction 2 awesome 2 yoruba 2 benchmark 2 bert 2 prompt 2 nltk 2 retrieval 2 noise 1 noisy-data 1 nueral-machine-translation 1 ocr 1 ocr-text 1 word-aligner 1 word-alignment 1 bangla-ai 1 chatbot 1 customer-service-chatbot 1 django-application 1 django-channels 1 resource-poor-languages 1 restful-api 1 event-detection 1 emnlp-2020 1 parallel-corpora 1 parallel-corpus 1 chemical-data 1 chemical-information-extraction 1 constrastive-learning 1 reconstruction 1 lexicon 1 swahili 1 bilingual-lexicon-induction 1 english-sinhala 1 fasttext-sinhala-word-embedding-alignment 1 labse 1 low-resource-word-embedding-alignment 1 multilingual-embeddings 1 procrustes-alignment 1 procrustes-analysis 1 rcsls-alignment 1 sinhala 1 sinhala-word-embeddings 1 supervised-embedding-alignment 1 unsupervised-embedding-alignment 1 vecmap 1 word-embedding-alignment 1 augmentation 1