Topic: "low-resource-nlp"
adbar/simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Language: Python - Size: 729 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 155 - Forks: 12

csebuetnlp/banglanmt
This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Language: Python - Size: 2.05 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 144 - Forks: 45

cisnlp/GlotLID
💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
Language: Python - Size: 409 KB - Last synced at: 10 days ago - Pushed at: 6 months ago - Stars: 128 - Forks: 8

ljvmiranda921/calamanCy
NLP pipelines for Tagalog using spaCy
Language: Python - Size: 978 KB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 54 - Forks: 2

231sm/Reasoning_In_EE
Code and datasets for the ACL 2021 paper "OntoED: Low-resource Event Detection with Ontology Embedding"
Language: Python - Size: 60.5 MB - Last synced at: 9 days ago - Pushed at: about 3 years ago - Stars: 45 - Forks: 16

zjunlp/RAP
[SIGIR 2023] Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction
Language: Python - Size: 17.1 MB - Last synced at: 10 months ago - Pushed at: about 2 years ago - Stars: 39 - Forks: 3

afrisenti-semeval/afrisent-semeval-2023
AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/
Language: Jupyter Notebook - Size: 33 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 38 - Forks: 38

KennethEnevoldsen/scandinavian-embedding-benchmark
A Scandinavian Benchmark for sentence embeddings
Language: Python - Size: 4.82 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 37 - Forks: 5

hausanlp/NaijaSenti Fork of shmuhammadd/NaijaSenti
This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, Hausa, Yoruba and Pidgin.
Size: 29.7 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 32 - Forks: 24

NLP-Tutorials/AACL-IJCNLP2022-KGC-Tutorial
Materials for AACL-IJCNLP-2022 tutorial: Efficient and Robust Knowledge Graph Construction
Size: 31 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 25 - Forks: 7

luciusssss/ZhuangBench
[ACL'24 Findings] Teaching Large Language Models an Unseen Language on the Fly
Language: Python - Size: 3.24 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 21 - Forks: 0

luciusssss/mc2_corpus
[ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)
Language: Python - Size: 602 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 20 - Forks: 2

nicolay-r/awesome-sentiment-attitude-extraction
A curated list of awesome sentiment analysis studies, in which attitude corresponds to the text position conveyed by Subject towards other Object mentioned in text such as: entities, events, etc.
Size: 1.46 MB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 19 - Forks: 1

StefanHeng/ProgGen
Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"
Language: Python - Size: 62.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 17 - Forks: 2

ijazul-haq/nlpashto
Pashto Natural Language Processing Toolkit
Size: 62.8 MB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 0

wannaphong/Awesome-Lao-NLP
Awesome Lao Natural Language Processing
Size: 11.7 KB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 12 - Forks: 0

csebuetnlp/banglaparaphrase
This repository contains the code, data, and associated models of the paper titled "BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset", accepted in Proceedings of the Asia-Pacific Chapter of the Association for Computational Linguistics: AACL 2022.
Language: Python - Size: 101 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

AsifulNobel/Metsys
Chatbot Solution for Resource-Poor Languages. Contains code and data for Journal Article 'Focused domain contextual AI chatbot framework for resource poor languages'.
Language: Python - Size: 56.1 MB - Last synced at: 6 months ago - Pushed at: almost 4 years ago - Stars: 9 - Forks: 6

nicolay-r/RuSentRel-Leaderboard
This is an official Leaderboard for the RuSentRel-1.1 dataset originally described in paper (arxiv:1808.08932)
Language: Python - Size: 1.88 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 0

ruoyuxie/noisy_parallel_data_alignment
Enhanced awesome-align for low-resource languages and noise simulation: https://arxiv.org/abs/2301.09685
Language: Python - Size: 245 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 1

Lhtie/Bio-Domain-Transfer
Implementation of NAACL 2024 main conference paper: Named Entity Recognition Under Domain Shift via Metric Learning for Life Science
Language: Python - Size: 19.2 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 4 - Forks: 0

zjunlp/OntoED Fork of 231sm/Reasoning_In_EE
Code and datasets for the ACL 2021 paper "OntoED: Low-resource Event Detection with Ontology Embedding"
Size: 60.5 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 0

devrimcavusoglu/nonwestlit
NONWESTLIT Project Codebase
Language: Python - Size: 239 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

galax19ksh/Manipuri-NLP
A comprehensive overview of research regarding Natural Language Processing (NLP) of Manipuri language.
Size: 214 KB - Last synced at: 14 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

EagleW/Chem-FINESE
Official implementation of the EACL Findings 2024 paper: Chem-FINESE: Validating Fine-Grained Few-shot Entity Extraction through Text Reconstruction
Language: Python - Size: 1.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

cisnlp/GlotStoryBook
Children StoryBooks for 180 langauges.
Language: Jupyter Notebook - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

vgupta123/contextualize_scdv
Unsupervised Contextualized Document Representation, to appear in SustaiNLP 2021 EMNLP 2021
Language: Python - Size: 5.42 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

Rui0828/Learning-From-Mistakes-Prompting
LoResMT@ACL 2024: Learning-From-Mistakes Prompting for Indigenous Language Translation – A feedback-driven approach to enhance low-resource translation.
Language: Python - Size: 4.92 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

pnborchert/MultiRep
Efficient Information Extraction in Few-Shot Relation Classification through Contrastive Representation Learning. NAACL 2024.
Language: Python - Size: 2.55 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

hausanlp/hausanlp
Hausa Natural Language Processing Repository
Size: 261 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

HenningBuhl/low-resource-machine-translation
This repository is an open-source colleciton of various low-resource machine translation experiments.
Language: Python - Size: 428 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 1

hanjiale/REPaperList
Must-read papers on relation extraction.
Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

SaraikiNLP/SaraikiNLP
SaraikiNLP | Natural Language Processing for Saraiki Language | NLP Toolkit | Saraiki NLP
Language: Jupyter Notebook - Size: 304 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 1 - Forks: 0

kasunw22/sinhala-word-embedding-alignment
English-Sinhala multilingual word embedding alignment resources
Language: Jupyter Notebook - Size: 40.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

chschroeder/self-training-for-sample-efficient-active-learning
Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models (EMNLP 2024)
Language: Python - Size: 0 Bytes - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

GGLAB-KU/turkish-plu
Code for AACL23 paper "Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish"
Language: Python - Size: 146 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

mdm-code/manx
Fine-tune LLM for early Middle English lemmatization with data from LAEME.
Language: Python - Size: 157 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

kalindasiaminwe/ChitongaASR
A natural language processing and machine learning project for a low resource langauge in Zambia.
Language: Jupyter Notebook - Size: 548 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

wenlai-lavine/lavine_blog
Wen Lai's Blog related to MT/NLP/ML
Language: HTML - Size: 4.88 MB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

kanincityy/misogyny_detection_transformers
Building an Effective Misogyny Detection Classifier for Low-Resource Languages
Size: 15.1 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

holatung/Kumlinda
Fortifying Community Truth: Developing Lexicons on Tech Facilitated GBV and Model for Low-resourced African Languages ( Hausa, Igbo, Yoruba and Swahili)
Size: 57.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mohammad-safari/Conditional_Comment_Generation_Using_TILGAN Fork of shizhediao/TILGAN
Conditional comment generation using TILGAN As Batchelor Project, Based On Findings of ACL-IJCNLP 2021 paper entitled "TILGAN: Transformer-based Implicit Latent GAN for Diverse and Coherent Text Generation"
Language: Python - Size: 79.6 MB - Last synced at: 6 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

nicolay-r/RuSentNE-LLM-Benchmark
This repository highlights the LLMs reasoning capabilities in Targeted Sentiment Analysis in Russian 📊
Language: Python - Size: 7.08 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

dialect-ai/BenHeadGen
This is the official repository contains the code, data, and models of the paper titled "Shironaam: Bengali News Headline Generation using Auxiliary Information", accepted for publication in Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL’23), May 2-6, 2023.
Size: 25.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

mariaviana21/tars_flair
An overview of the possibilities of using TARS models for low language resources
Language: Jupyter Notebook - Size: 348 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

mosh98/Text_Aug_Low_Res
AAAI Knowledge NLP Submission
Language: Jupyter Notebook - Size: 104 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

hellomasaya/hawp
This repository provides HAWP: a dataset for Hindi Word Problem Solving and a baseline (LREC 2022)
Size: 174 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0
