Topic: "xlm-roberta"
dbiir/UER-py
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
Language: Python - Size: 50.5 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 3,064 - Forks: 522

Tencent/TencentPretrain
Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo
Language: Python - Size: 41.2 MB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 1,070 - Forks: 147

explosion/curated-transformers
🤖 A PyTorch library of curated Transformer models and their composable components
Language: Python - Size: 1.47 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 892 - Forks: 34

nlp-uoregon/trankit
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Language: Python - Size: 1.05 MB - Last synced at: about 22 hours ago - Pushed at: about 24 hours ago - Stars: 759 - Forks: 103

iflytek/cino
CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)
Language: Python - Size: 21.7 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 247 - Forks: 31

csebuetnlp/banglabert
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: NAACL-2022.
Language: Python - Size: 1.14 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 230 - Forks: 31

EveripediaNetwork/fastc
Unattended Lightweight Text Classifiers with LLM Embeddings
Language: Python - Size: 117 KB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 185 - Forks: 11

tensordot/syntaxdot
Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.
Language: Rust - Size: 1010 KB - Last synced at: 24 days ago - Pushed at: almost 2 years ago - Stars: 78 - Forks: 3

GeekDream-x/SemEval2022-Task8-TonyX
Deep-learning system proposed by HFL for SemEval-2022 Task 8: Multilingual News Similarity
Language: Python - Size: 2.82 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 40 - Forks: 6

hate-alert/Tutorial-Resources
Resources and tools for the Tutorial - "Hate speech detection, mitigation and beyond" presented at ICWSM 2021
Language: Python - Size: 291 KB - Last synced at: 9 months ago - Pushed at: over 3 years ago - Stars: 36 - Forks: 7

Data-Science-kosta/Long-texts-Sentiment-Analysis-RoBERTa
PyTorch implementation of Sentiment Analysis of the long texts written in Serbian language (which is underused language) using pretrained Multilingual RoBERTa based model (XLM-R) on the small dataset.
Language: Jupyter Notebook - Size: 8.5 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 7

crux82/AILC-lectures2021-lab
This is a Pytorch (+ Huggingface transformers) implementation of a "simple" text classifier defined using BERT-based models. In this lab we will see how it is simple to use BERT for a sentence classification task, obtaining state-of-the-art results in few lines of python code.
Language: Jupyter Notebook - Size: 185 KB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 18 - Forks: 5

Kirill-Kravtsov/drophead-pytorch
An implementation of drophead regularization for pytorch transformers
Language: Python - Size: 13.7 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 15 - Forks: 6

Data-Science-kosta/Twitter-Sentiment-Analysis-RoBERTa
Sentiment Analysis of tweets written in underused Slavic languages (Serbian, Bosnian and Croatian) using pretrained multilingual RoBERTa based model XLM-R on 2 different datasets.
Language: Jupyter Notebook - Size: 4.74 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 15 - Forks: 1

ashwanitanwar/nmt-transfer-learning-xlm-r
Improving Low-Resource Neural Machine Translation of Related Languages by Transfer Learning
Language: Python - Size: 16.5 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 5

SapienzaNLP/guardians-mt-eval
Official repository of the ACL 2024 paper "Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!".
Language: Python - Size: 958 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 10 - Forks: 1

cambridgeltl/BLICEr
Improving Bilingual Lexicon Induction with Cross-Encoder Reranking (Findings of EMNLP 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.
Language: Python - Size: 158 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 3

haozhg/lmd
Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models
Language: Python - Size: 1.82 MB - Last synced at: 25 days ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

sukanyabag/Detecting-Contradictions-and-Entailment-in-Multilingual-Text
A case study of NLI ( Natural Language Inferencing) with Transfer Learning. Kaggle Competition Rank - 18th (Global)
Language: Jupyter Notebook - Size: 218 KB - Last synced at: 19 days ago - Pushed at: almost 4 years ago - Stars: 9 - Forks: 0

leffff/AI-IJC
1st place solution to AI IJC Customer Service task
Size: 60.6 MB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 2

brunneis/ilab-erisk-2020
Repository accompanying the CLEF 2020 eRisk Workshop Working Notes for the iLab team (University Of Strathclyde)
Language: Jupyter Notebook - Size: 32.2 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 7 - Forks: 1

SumitM0432/XLM-RoBERTa-for-Textual-Entailment
A multilingual model XLM- RoBERTa for the textual entailment of sequence pair - premise and hypothesis of 15 different languages using the MNLI and XNLI corpus.
Language: Jupyter Notebook - Size: 1.99 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 6 - Forks: 0

faezesarlakifar/text-emotion-recognition
Persian text emotion recognition by fine tuning the XLM-RoBERTa Model + Bidirectional GRU layer.
Language: Jupyter Notebook - Size: 1020 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

BUAADreamer/CCRK
[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Language: Python - Size: 644 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

longday1102/Demo-QA-Extraction-system
⚡ The system extracts answers from a given context
Language: Python - Size: 2.71 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

SayamAlt/Language-Detection-using-fine-tuned-XLM-Roberta-Base-Transformer-Model
Successfully developed a language detection transformer model that can accurately recognize the language in which any given text is written.
Language: Jupyter Notebook - Size: 1.09 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 4

codewithzichao/Multilingual-Transformers
Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th, 4th and 3rd in Tamil, Malayalam and Kannada language of this task finally!🥳
Language: Python - Size: 24.4 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 3

viktor-shcherb/vive_la_ner
The default way to fine-tune BERT is wrong. Here is why
Language: Jupyter Notebook - Size: 107 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 4 - Forks: 0

fatemafaria142/MultiBanFakeDetect-An-Extensive-Benchmark-Dataset-for-Multimodal-Bangla-Fake-News-Detection
This study introduces MultiBanFakeDetect, a novel multimodal dataset for Bangla fake news detection, combining textual and visual information. It features TextFakeNet for text analysis and MultiFusionFake for integrating multimodal data.
Language: Jupyter Notebook - Size: 308 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 4 - Forks: 1

seanghay/khmerpunctuate
Punctuation Restoration for Khmer language
Language: Python - Size: 2.69 MB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 1

dimitreOliveira/Jigsaw-Multilingual-Toxic-Comment-Classification
:3rd_place_medal: (Bronze medal - 100th place - Top 7%) Repository for the "Jigsaw Multilingual Toxic Comment Classification" Kaggle competition.
Language: Jupyter Notebook - Size: 14.8 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

abhilash1910/NLP-Workshop-ML-India
NLP Workshop -ML India
Language: Jupyter Notebook - Size: 74.2 KB - Last synced at: 6 days ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 1

RobinSmits/Dutch-NLP-Experiments
This repository contains a number of experiments with Multi Lingual Transformer models (Multi-Lingual BERT, DistilBERT, XLM-RoBERTa, mT5 and ByT5) focussed on the Dutch language.
Language: Python - Size: 794 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

dsfsi/zabantu-beta
ZaBantu is a fleet of light-weight Masked Language Models for Southern Bantu Languages
Language: Python - Size: 3.12 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

MLArtist/intent-detection-using-XLM-Roberta
This repository is a comprehensive project that leverages the XLM-Roberta model for intent detection. This repository is a valuable resource for developers looking to build and fine-tune intent detection models based on state-of-the-art techniques.
Language: Jupyter Notebook - Size: 94.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

hyperonym/dirge
Dirge is a collection of foundation models for Natural Language Processing (NLP). This repository contains scripts and notebooks to replicate the models available on Hugging Face Model Hub.
Language: Jupyter Notebook - Size: 76.2 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

honghanhh/ate-2022
Can Cross-domain Term Extraction Benefit from Cross-lingual Transfer?
Language: Python - Size: 130 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

DiFronzo/Multilingual-Models
mBERT and XLM-R for encodeing of Scandinavian languages
Language: Python - Size: 518 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

Text2TCS/Transrelation
The winning system of the Text2TCS project team submitted to the CogaLex VI shared task.
Language: Jupyter Notebook - Size: 394 KB - Last synced at: 8 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

elsheikh21/cross-natural-language-inference
ZeroShot XNLI
Language: Python - Size: 1.65 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

haturusinghe/subasa-plm
A framework for adapting Pretrained Language Models (XLM-R, BERT etc.) for Low-Resourced Offensive Language Detection in Sinhala using pretrained models and intermediate tasks.
Language: Python - Size: 3.73 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

MichiganNLP/multilingual_reviews_deception
Multilingual Deception Detection of GPT-generated Hotel Reviews
Language: HTML - Size: 8.94 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Semihocakli/nlp-with-hugging-face
Language: Jupyter Notebook - Size: 247 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

alessandromonolo/Descriptive-Texts-Classification-By-Usage-Purposes-Of-Estate-Properties
The project aims to identify the best model for the classification of texts derived from descriptions of assets subject to Italian judicial auctions. The employed models include both conventional models, such as Logistic Regression, Naive Bayes, SVM, and XGBoost, and neural network models, such as Fasttext and XLM-Roberta.
Language: Jupyter Notebook - Size: 14.3 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

gerardPlanella/MultiLingual_Stereotypes
Study on stereotype transfer accross Multilingual Language Models for English, Spanish, French, Greek and Croatian. The emotion profiles for different social groups are obtained from pre-trained XLM-RoBERTa and fine-tuned versions of it.
Language: Jupyter Notebook - Size: 9.93 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

StepanTita/news-contest
This repository contains Jupyter notebooks detailing the experiments conducted in our research paper on Ukrainian news classification. We introduce a framework for simple classification dataset creation with minimal labeling effort, and further compare several pretrained models for the Ukrainian language.
Language: Jupyter Notebook - Size: 698 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Jayveersinh-Raj/cross-lingual-zero-shot-transfer
This is cross based project for industrial purposes. This would then be integrated as API, and repository would be made private accordingly
Language: Jupyter Notebook - Size: 5.74 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

honghanhh/sdjt-ate
A Transformer-based Sequence-labeling Approach to the Slovenian Cross-domain Automatic Term Extraction
Language: Python - Size: 114 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

ramachandra742/NLP-notebooks
NLP notebooks
Language: Jupyter Notebook - Size: 7.54 MB - Last synced at: 6 months ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

niloydebbarma-code/LORA-FINETUNING-BANGLASENTI-XLMR-GOOGLE-TPU
First open-source LoRA fine-tuning of BanglaSenti on XLM-RoBERTa-base for Bengali sentiment analysis, trained with Google Cloud TPUs. Includes code, configs, and reproducible results.
Language: Python - Size: 4.79 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

pixiiidust/semantic-ising
Do semantically identical words across languages converge in embedding space? Currently WIP and testing: This tool aims to visualize multilingual embedding alignment under Ising dynamics, revealing latent structure as the system approaches critical temperature. Inspired by the Platonic representation hypothesis.
Language: Python - Size: 548 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

shaitarAn/negations
Master thesis work. Implementation of transfer-learning for cross-lingual zero-shot and few-shot negation scope resolution.
Language: Jupyter Notebook - Size: 24.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

shaitarAn/subword-evenness-crosslingual-transfer
Analysis of subword evenness as a predictor of cross-lingual transfer success in multilingual language models (mBERT, XLM-R, mT5)
Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Cross-Border-E-Commerce-AI/Cross-Cultural-Merchandising-Expert
This research proposes an "AI-Driven Cross-Cultural Commodity Expert" framework, which addresses three critical technical bottlenecks through synergistic innovations in multilingual sentiment analysis, cultural quantification engines, and dynamic knowledge graphs
Language: Python - Size: 4.24 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Soumyo001/sentiment-emotion_detection_on_bengali_product_reviews
Sentiment and emotion detection using mBERT and XLM-R. It comes with a trained model which you can download and test it. Read below for instructions.
Language: Jupyter Notebook - Size: 15.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Jubeerathan/Annaparavai
This repository contains the implementation of a transfer learning-based approach for detecting AI-generated product reviews in Tamil and Malayalam. It includes pretrained model embeddings, deep neural networks, and an ensemble method to enhance classification accuracy.
Language: Jupyter Notebook - Size: 939 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

SimReale/tweets_sexism_detector
NLP assignments about creating a sexism classifier with different approaches.
Language: Jupyter Notebook - Size: 155 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

joyou159/SWIZT Fork of MohamedAlaaAli/SWIZT
Exploring the use of multilingual transformers, specifically mBERT and XLM-RoBERTa, for named entity recognition (NER) in the context of Switzerland’s multi lingual environment.
Language: Jupyter Notebook - Size: 1.35 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Luxshan2000/TamMalKavacham
TamMalKavacham is an open-source tool for detecting abusive content in Tamil and Malayalam, focused on harmful language targeting women. Developed as part of the DravidianLangTech@NAACL 2025 shared task, it uses NLP and machine learning for accurate text classification and content analysis.
Language: Jupyter Notebook - Size: 396 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

fatemafaria142/BanglaCalamityMMD-A-Comprehensive-Benchmark-Dataset-for-Multimodal-Disaster-Identification Fork of Mukaffi28/BanglaCalamityMMD-A-Comprehensive-Benchmark-Dataset-for-Multimodal-Disaster-Identification
This study presents a novel multimodal fusion technique for disaster identification in Bangla, combining text and image data using the "BanglaCalamityMMD" dataset. Employing DisasterTextNet, DisasterImageNet, and DisasterMultFusionNet, the approach addresses a key gap in Bangla disaster research.
Language: Jupyter Notebook - Size: 290 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

rasyosef/amharic-news-category-classification
notebooks to finetune `xlm-roberta-base` and `bert-small-amharic` models using an Amharic text classification dataset and the transformers library
Language: Jupyter Notebook - Size: 45.9 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

koshkidadanet/lilt-finetuning-piad-ya-ocr
Проект в рамках ВКР под названием "Разработка программного модуля для анализа документов, подтверждающих индивидуальные достижения"
Language: Jupyter Notebook - Size: 12.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Ritwika-Das-Gupta/Disease-Diagnosis-Chatbot
This project explores the foundational concepts of ML, NLP, and model optimization to develop an efficient and user-friendly healthcare solution.
Language: Jupyter Notebook - Size: 1.79 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Revanth-Reddy-Pingala/Abusive_Comment_Detector_BERT
Fine tuned BERT, mBERT and XLMRoBERTa for Abusive Comments Detection in Telugu, Code-Mixed Telugu and Telugu-English.
Language: Jupyter Notebook - Size: 55.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

MD-Ryhan/THREATENING_TEXT_DETECTION_USING_CNN_LSTM_BILSTM_XLMROBERTA
THREATENING_TEXT_DETECTION_USING_CNN_LSTM_BILSTM_XLMROBERTA
Language: Jupyter Notebook - Size: 186 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

hishamp3/MasterThesis-Lies-DeceptiveText
Fine-tuning Language Models
Language: Jupyter Notebook - Size: 12.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RyanDsilva/clef-2023-joker
Code Repository for AKRaNLU @ CLEF JOKER 2023: Using Sentence Embeddings and Multilingual Models to Detect and Interpret Wordplay
Size: 0 Bytes - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

ptsourav21/NewsHeadlneCategorizationNLP
XLM-Roberta is a transformer model that categorized bengali news headline into six different categories. This code is used in research work which is published in one of the IEEE conferences.
Language: Jupyter Notebook - Size: 5.21 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

seanghay/khmer-text-classification-roberta
Khmer New Classification with XLM-RoBERTa
Language: Python - Size: 8.79 KB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

awinml/jigsaw-toxic-comment-clf
Built a multilingual text classification model to predict the probability that a comment is toxic using the data provided by Google Jigsaw.
Language: Python - Size: 50.6 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

mmaguero/clickbait-detector-en
Clickbait detector for English tweets, trained on Webis-17 dataset
Language: Jupyter Notebook - Size: 80.1 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

AditiBagora/Hasoc2021CodeMix
HASOC2021: Subtask 2 a) Codemix Challenge; Contains baselines and hierarchical approach that extracts the relevant context useful for classification of hostile tweets on English-Hindi code-mix data obtained from twitter.
Language: Jupyter Notebook - Size: 26.4 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0
