An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: multilingual-nlp

Tanlouie/Gendered_Abuse_Detection_In_Indic-Languages

Online gender-based violence limits marginalized voices. Detection in Indic languages is hard due to limited data and linguistic complexity. This work builds better classifiers for improved abuse detection in such settings.

Language: Jupyter Notebook - Size: 17.8 MB - Last synced at: about 1 hour ago - Pushed at: about 2 hours ago - Stars: 0 - Forks: 0

spqr-86/gitlab-onboarding-rag

A multilingual (RU/EN) RAG system built on 847 pages of GitLab documentation to provide instant answers for new employees, achieving 89% response accuracy and sub-3-second latency.

Language: HTML - Size: 3.23 MB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 0 - Forks: 0

embeddings-benchmark/mteb

MTEB: Massive Text Embedding Benchmark

Language: Python - Size: 40.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,626 - Forks: 422

epfl-nlp/ConLID

Language: Python - Size: 356 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

MS134340/Sentence-Embeddings-Using-N-Gram-Features-and-Contrastive-Learning-for-Multilingual-Datasets

"Generates sentence embeddings using N-gram features and contrastive learning, optimized for multilingual datasets and semantic similarity tasks."

Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

franciellevargas/MFTCXplain

MFTCXplain is the first multilingual benchmark dataset designed to evaluate the moral reasoning of Large Language Models (LLM) through multi-hop hate speech explanations grounded in Moral Foundations Theory (MFT).

Language: Jupyter Notebook - Size: 2.77 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

BeaEsparcia/Spanish_Text_Classification_BERT

Spanish text classifier using BERT to detect user intent (information request, complaint, or recommendation). Includes synthetic training data and custom ambiguous examples to test robustness. Portfolio project focused on intent recognition and conversational design in Spanish.

Language: Jupyter Notebook - Size: 460 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

FSoft-AI4Code/TheVault

[EMNLP 2023] The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

Language: Jupyter Notebook - Size: 9.44 MB - Last synced at: 9 days ago - Pushed at: 10 months ago - Stars: 96 - Forks: 9

microsoft/Multilingual-Culture-First-Misgendering-Guardrails

Repository for the paper "A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications"

Size: 3.29 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

llm-lab-org/MENA-Values-Benchmark-Evaluating-Cultural-Alignment-and-Multilingual-Bias-in-Large-Language-Models

This repository contains the dataset and code used in our paper, “MENA Values Benchmark: Evaluating Cultural Alignment and Multilingual Bias in Large Language Models.” It provides tools to evaluate how large language models represent Middle Eastern and North African cultural values across 16 countries, multiple languages, and perspectives.

Language: Python - Size: 87.3 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 1 - Forks: 0

epfl-dlab/llm-latent-language

Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".

Language: Jupyter Notebook - Size: 2.54 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 76 - Forks: 17

eriglesias/RosettaFables

Analyzing Aesop's fables across languages using advanced NLP techniques

Language: Python - Size: 19 MB - Last synced at: 3 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

swaggy66/M-ABSA

M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis

Language: Python - Size: 30.5 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 4 - Forks: 1

cisnlp/Glot500

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023

Language: Python - Size: 151 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 101 - Forks: 4

bigscience-workshop/xmtf

Crosslingual Generalization through Multitask Finetuning

Language: Jupyter Notebook - Size: 28.6 MB - Last synced at: 29 days ago - Pushed at: 9 months ago - Stars: 533 - Forks: 39

VenkateshSoni/MarathiSentimentAnalysis

A natural language processing project focused on analyzing sentiment in Marathi text. It leverages state of the art Large Language Model to classify text into positive, negative, or neutral sentiments—enabling sentiment-aware applications in regional languages.

Language: Python - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

sardaralikhamosh/Burushaski_Words_Network

A network science and computational linguistics project analyzing lexical connections between Burushaski and global languages using centrality measures and interactive graph visualizations.

Language: Python - Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

sankar-2002/Gendered_Abuse_Detection_In_Indic-Languages

Online gender-based violence limits marginalized voices. Detection in Indic languages is hard due to limited data and linguistic complexity. This work builds better classifiers for improved abuse detection in such settings.

Language: Jupyter Notebook - Size: 18.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

AnanthaRajuC/AIML_NLP

AIML Natural Language Processing - Speech, Audio

Language: Java - Size: 4.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

viktor-shcherb/vive_la_ner

The default way to fine-tune BERT is wrong. Here is why

Language: Jupyter Notebook - Size: 107 KB - Last synced at: 3 days ago - Pushed at: 7 months ago - Stars: 4 - Forks: 0

AlokTheDataGuy/internship_projects

Multiple chatbots and NLP-based projects completed during my internship. Each project demonstrates different aspects of AI application development, from text summarization to multilingual chatbots.

Language: Python - Size: 4.53 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

harmonydata/harmony_r

R library for Harmony. R package - open source tool using AI for psychology and mental health. Actively recruiting contributors.

Language: HTML - Size: 1.19 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 3

GALA-MDS/Gala-External-Resources

This repository compiles and data sources created for the CHIST ERA 2025 proposal GALA.

Language: Jupyter Notebook - Size: 70.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

cisnlp/MEXA

🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

Language: Python - Size: 26.4 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 10 - Forks: 0

DmitryRyumin/EMNLP-2023-Papers

EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learning, deep learning, and natural language processing with code included. :star: support NLP!

Language: Python - Size: 6.43 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 107 - Forks: 7

ceferisbarov/TUMLU

TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages

Language: Python - Size: 38.3 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 19 - Forks: 1

swaggy66/MSMO

Multi-Scale and Multi-Objective Optimization for Cross-Lingual Aspect-Based Sentiment Analysis

Language: Python - Size: 1.84 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 1

deokhk/CBP

Official Repository for Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsing (EMNLP 2024)

Language: Python - Size: 7.32 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

cambridgeltl/sail-bli

Self-Augmented In-Context Learning for Unsupervised Word Translation (ACL 2024). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.

Language: Python - Size: 445 KB - Last synced at: 10 days ago - Pushed at: 10 months ago - Stars: 3 - Forks: 1

cambridgeltl/prompt4bli

On Bilingual Lexicon Induction with Large Language Models (EMNLP 2023). Keywords: Bilingual Lexicon Induction, Word Translation, Large Language Models, LLMs.

Language: Python - Size: 86.9 KB - Last synced at: 10 days ago - Pushed at: 5 months ago - Stars: 10 - Forks: 2

pintamonas4575/NLP-text-detox-MAADM-UPM

NLP for detoxing language phrases in several languages.

Language: Jupyter Notebook - Size: 8.82 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

joyou159/SWIZT Fork of MohamedAlaaAli/SWIZT

Exploring the use of multilingual transformers, specifically mBERT and XLM-RoBERTa, for named entity recognition (NER) in the context of Switzerland’s multi lingual environment.

Language: Jupyter Notebook - Size: 1.35 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Helsinki-NLP/lm-vs-mt

A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives

Language: Python - Size: 1.15 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

koushik16/Naive-Bayes-on-Multi-Language-Text

Implementation of Naive Bayes for text classification across multiple languages, focusing on natural language processing and multilingual text analysis.

Language: Python - Size: 2.93 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Wei-RongRong2/RojakLanguageSentimentAnalysis

This is a machine learning project focused on analysing and classifying sentiments in code-switched and code-mixed text, specifically targeting the unique linguistic characteristics found in Malaysian conversations.

Language: Jupyter Notebook - Size: 20.6 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

BatsResearch/cross-lingual-detox

Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages"

Language: Jupyter Notebook - Size: 309 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 14 - Forks: 0

CristianBudala/Multilingual-Sentiment-Analysis-and-Intent-Classification

Multilingual sentiment analysis and intent classification in Romanian, Bachelors thesis

Language: Jupyter Notebook - Size: 837 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

MaLA-LM/mala-500

MaLA-500: Massive Language Adaptation of Large Language Models

Language: Python - Size: 97.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

Rajarshi1001/IITK-SemEval-2024-Task-1

SemEval task 1: Semantic Textual Relatedness for the course CS779A

Language: Jupyter Notebook - Size: 9.92 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

csebuetnlp/CrossSum

This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+ Language Pairs" published in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23), July 9-14, 2023.

Language: Python - Size: 5.72 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 47 - Forks: 7

aditi184/MultilingualQA

Chaii (Challenge in AI for India) Multilingual QnA - Google Research India

Language: Jupyter Notebook - Size: 26.4 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

negar-foroutan/multilingual-code-switched-reasoning

[EMNLP 2023 - Findings] Breaking the Language Barrier: Improving Cross-Lingual Reasoning with Structured Self-Attention

Language: Python - Size: 41 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 2

shijie-wu/crosslingual-nlp

This repo supports various cross-lingual transfer learning & multilingual NLP models.

Language: Python - Size: 125 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 89 - Forks: 7

longxudou/multispider

MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Language: Python - Size: 194 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

negar-foroutan/multiLMs-lang-neutral-subnets

[EMNLP 2022] Discovering Language-neutral Sub-networks in Multilingual Language Models.

Language: Python - Size: 831 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 1

sristhilamichhane/multilingo

Its a language learning app. Using React, Material UI and Node js.

Language: JavaScript - Size: 257 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

muhammadravi251001/multilingual-qas-with-nli

Code Repository for Paper: Multilingual Question Answering System Utilizing Natural Language Inference.

Language: Jupyter Notebook - Size: 1.35 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ramisa2108/Bangla-Complex-Named-Entity-Recognition-Challenge

Winning Solution for the Bangla Complex Named Entity Recognition Challenge - BDOSN NLP Hackathon 2023

Language: Jupyter Notebook - Size: 2.9 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 0

e-hossam96/CMU-CS11-737

Solutions of the CMU Multilingual Natural Language Processing Course

Language: Shell - Size: 120 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

harmonydata/harmony_original

The Harmony project

Language: Jupyter Notebook - Size: 2.77 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 1

Judy-Choi/NMT_Series

A collection of codes in a NMT series of Geultto 8th

Language: Jupyter Notebook - Size: 5.53 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

thesofakillers/CLAfICLe

Official repository for the paper "CLAfICLe: Cross-Lingual Adaptation for In-Context Learning". Not Published.

Language: TeX - Size: 13.9 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

dkalpakchi/quinductor

A multilingual data-driven method for generating reading comprehension questions

Language: Jupyter Notebook - Size: 7.21 MB - Last synced at: 19 days ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

mobassir94/Multilingual-NLP-for-Islamic-Theology

Cross Lingual Language models for making search engines for Holy Quran and Sahih Hadiths

Language: Jupyter Notebook - Size: 151 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 0

Related Keywords
multilingual-nlp 54 nlp 16 natural-language-processing 15 deep-learning 8 multilingual 7 machine-learning 7 large-language-models 7 pytorch 7 python 5 machine-translation 5 bert 5 zero-shot-learning 4 mbert 4 transformers 4 cross-lingual-transfer 4 sentiment-analysis 4 llms 4 named-entity-recognition 4 ai 3 data-science 3 multilingual-models 3 computational-linguistics 3 xlm-roberta 2 intent-detection 2 gru 2 mechanistic-interpretability 2 gender-abuse-detection 2 llm 2 llama 2 low-resource-machine-translation 2 prompt 2 prompt-engineering 2 llama2 2 prompting 2 convolutional-neural-networks 2 dataset 2 ner 2 linguistics 2 mt5 2 word-translation 2 multilingual-absa 2 huggingface-transformers 2 semantic-search 2 bloom 2 nlp-machine-learning 2 transformer-models 2 rag 2 psychology 2 sentence-transformers 2 language-models 2 text-classification 2 semantic-parsing 2 multilingual-language-models 2 bilingual-dictionary-induction 2 bilingual-lexicon-extraction 2 bilingual-lexicon-induction 2 few-shot-learning 2 artificial-intelligence 2 indicbert 2 in-context-learning 2 flask-application 1 ml-algorithms 1 docker-image 1 multi-language 1 naive-bayes 1 python3 1 code-switching 1 code-mixing 1 text-mining-analysis 1 customer-support 1 embeddings 1 evaluation 1 evaluation-metrics 1 emnlp 1 emnlp2023 1 gpt 1 nlp-applications 1 syntax-and-semantics 1 text-mining 1 word-embeddings 1 emnlp2024 1 self-learning 1 prompts 1 flan-t5 1 text-detoxification 1 language-modeling 1 pretraining 1 bayesian-inference 1 lstm 1 text-to-sql 1 lottery-ticket-hypothesis 1 language-learning 1 nodejs 1 reactjs 1 artificial-neural-networks 1 natural-language-inference 1 multilingual-nmt 1 multilingual-sequence-labeling 1 neural-network 1 data-visualization 1