xlm-roberta | Topic | Ecosyste.ms: Repos

Topic: "xlm-roberta"

dbiir/UER-py

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo

Language: Python - Size: 50.5 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 3,064 - Forks: 522

Tencent/TencentPretrain

Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo

Language: Python - Size: 41.2 MB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 1,070 - Forks: 147

explosion/curated-transformers

🤖 A PyTorch library of curated Transformer models and their composable components

Language: Python - Size: 1.47 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 892 - Forks: 34

nlp-uoregon/trankit

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Language: Python - Size: 1.05 MB - Last synced at: about 22 hours ago - Pushed at: about 24 hours ago - Stars: 759 - Forks: 103

iflytek/cino

CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)

Language: Python - Size: 21.7 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 247 - Forks: 31

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: NAACL-2022.

Language: Python - Size: 1.14 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 230 - Forks: 31

EveripediaNetwork/fastc

Unattended Lightweight Text Classifiers with LLM Embeddings

Language: Python - Size: 117 KB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 185 - Forks: 11

tensordot/syntaxdot

Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.

Language: Rust - Size: 1010 KB - Last synced at: 24 days ago - Pushed at: almost 2 years ago - Stars: 78 - Forks: 3

GeekDream-x/SemEval2022-Task8-TonyX

Deep-learning system proposed by HFL for SemEval-2022 Task 8: Multilingual News Similarity

Language: Python - Size: 2.82 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 40 - Forks: 6

hate-alert/Tutorial-Resources

Resources and tools for the Tutorial - "Hate speech detection, mitigation and beyond" presented at ICWSM 2021

Language: Python - Size: 291 KB - Last synced at: 9 months ago - Pushed at: over 3 years ago - Stars: 36 - Forks: 7

Data-Science-kosta/Long-texts-Sentiment-Analysis-RoBERTa

PyTorch implementation of Sentiment Analysis of the long texts written in Serbian language (which is underused language) using pretrained Multilingual RoBERTa based model (XLM-R) on the small dataset.

Language: Jupyter Notebook - Size: 8.5 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 7

crux82/AILC-lectures2021-lab

This is a Pytorch (+ Huggingface transformers) implementation of a "simple" text classifier defined using BERT-based models. In this lab we will see how it is simple to use BERT for a sentence classification task, obtaining state-of-the-art results in few lines of python code.

Language: Jupyter Notebook - Size: 185 KB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 18 - Forks: 5

Kirill-Kravtsov/drophead-pytorch

An implementation of drophead regularization for pytorch transformers

Language: Python - Size: 13.7 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 15 - Forks: 6

Data-Science-kosta/Twitter-Sentiment-Analysis-RoBERTa

Sentiment Analysis of tweets written in underused Slavic languages (Serbian, Bosnian and Croatian) using pretrained multilingual RoBERTa based model XLM-R on 2 different datasets.

Language: Jupyter Notebook - Size: 4.74 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 15 - Forks: 1

ashwanitanwar/nmt-transfer-learning-xlm-r

Improving Low-Resource Neural Machine Translation of Related Languages by Transfer Learning

Language: Python - Size: 16.5 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 5

SapienzaNLP/guardians-mt-eval

Official repository of the ACL 2024 paper "Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!".

Language: Python - Size: 958 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 10 - Forks: 1

cambridgeltl/BLICEr

Improving Bilingual Lexicon Induction with Cross-Encoder Reranking (Findings of EMNLP 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-Lingual Word Embeddings.

Language: Python - Size: 158 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 3

haozhg/lmd

Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

Language: Python - Size: 1.82 MB - Last synced at: 25 days ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 1

sukanyabag/Detecting-Contradictions-and-Entailment-in-Multilingual-Text

A case study of NLI ( Natural Language Inferencing) with Transfer Learning. Kaggle Competition Rank - 18th (Global)

Language: Jupyter Notebook - Size: 218 KB - Last synced at: 19 days ago - Pushed at: almost 4 years ago - Stars: 9 - Forks: 0

leffff/AI-IJC

1st place solution to AI IJC Customer Service task

Size: 60.6 MB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 2

brunneis/ilab-erisk-2020

Repository accompanying the CLEF 2020 eRisk Workshop Working Notes for the iLab team (University Of Strathclyde)

Language: Jupyter Notebook - Size: 32.2 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 7 - Forks: 1

SumitM0432/XLM-RoBERTa-for-Textual-Entailment

A multilingual model XLM- RoBERTa for the textual entailment of sequence pair - premise and hypothesis of 15 different languages using the MNLI and XNLI corpus.

Language: Jupyter Notebook - Size: 1.99 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 6 - Forks: 0

faezesarlakifar/text-emotion-recognition

Persian text emotion recognition by fine tuning the XLM-RoBERTa Model + Bidirectional GRU layer.

Language: Jupyter Notebook - Size: 1020 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

BUAADreamer/CCRK

[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

Language: Python - Size: 644 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

longday1102/Demo-QA-Extraction-system

⚡ The system extracts answers from a given context

Language: Python - Size: 2.71 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

SayamAlt/Language-Detection-using-fine-tuned-XLM-Roberta-Base-Transformer-Model

Successfully developed a language detection transformer model that can accurately recognize the language in which any given text is written.

Language: Jupyter Notebook - Size: 1.09 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 4

codewithzichao/Multilingual-Transformers

Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th, 4th and 3rd in Tamil, Malayalam and Kannada language of this task finally!🥳

Language: Python - Size: 24.4 KB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 3

viktor-shcherb/vive_la_ner

The default way to fine-tune BERT is wrong. Here is why

Language: Jupyter Notebook - Size: 107 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 4 - Forks: 0

fatemafaria142/MultiBanFakeDetect-An-Extensive-Benchmark-Dataset-for-Multimodal-Bangla-Fake-News-Detection

This study introduces MultiBanFakeDetect, a novel multimodal dataset for Bangla fake news detection, combining textual and visual information. It features TextFakeNet for text analysis and MultiFusionFake for integrating multimodal data.

Language: Jupyter Notebook - Size: 308 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 4 - Forks: 1

seanghay/khmerpunctuate

Punctuation Restoration for Khmer language

Language: Python - Size: 2.69 MB - Last synced at: 15 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 1

dimitreOliveira/Jigsaw-Multilingual-Toxic-Comment-Classification

:3rd_place_medal: (Bronze medal - 100th place - Top 7%) Repository for the "Jigsaw Multilingual Toxic Comment Classification" Kaggle competition.

Language: Jupyter Notebook - Size: 14.8 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

abhilash1910/NLP-Workshop-ML-India

NLP Workshop -ML India

Language: Jupyter Notebook - Size: 74.2 KB - Last synced at: 6 days ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 1

RobinSmits/Dutch-NLP-Experiments

This repository contains a number of experiments with Multi Lingual Transformer models (Multi-Lingual BERT, DistilBERT, XLM-RoBERTa, mT5 and ByT5) focussed on the Dutch language.

Language: Python - Size: 794 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

dsfsi/zabantu-beta

ZaBantu is a fleet of light-weight Masked Language Models for Southern Bantu Languages

Language: Python - Size: 3.12 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

MLArtist/intent-detection-using-XLM-Roberta

This repository is a comprehensive project that leverages the XLM-Roberta model for intent detection. This repository is a valuable resource for developers looking to build and fine-tune intent detection models based on state-of-the-art techniques.

Language: Jupyter Notebook - Size: 94.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

hyperonym/dirge

Dirge is a collection of foundation models for Natural Language Processing (NLP). This repository contains scripts and notebooks to replicate the models available on Hugging Face Model Hub.

Language: Jupyter Notebook - Size: 76.2 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

honghanhh/ate-2022

Can Cross-domain Term Extraction Benefit from Cross-lingual Transfer?

Language: Python - Size: 130 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

DiFronzo/Multilingual-Models

mBERT and XLM-R for encodeing of Scandinavian languages

Language: Python - Size: 518 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

Text2TCS/Transrelation

The winning system of the Text2TCS project team submitted to the CogaLex VI shared task.

Language: Jupyter Notebook - Size: 394 KB - Last synced at: 8 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

elsheikh21/cross-natural-language-inference

ZeroShot XNLI

Language: Python - Size: 1.65 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

haturusinghe/subasa-plm

A framework for adapting Pretrained Language Models (XLM-R, BERT etc.) for Low-Resourced Offensive Language Detection in Sinhala using pretrained models and intermediate tasks.

Language: Python - Size: 3.73 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

MichiganNLP/multilingual_reviews_deception

Multilingual Deception Detection of GPT-generated Hotel Reviews

Language: HTML - Size: 8.94 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Semihocakli/nlp-with-hugging-face

Language: Jupyter Notebook - Size: 247 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

alessandromonolo/Descriptive-Texts-Classification-By-Usage-Purposes-Of-Estate-Properties

The project aims to identify the best model for the classification of texts derived from descriptions of assets subject to Italian judicial auctions. The employed models include both conventional models, such as Logistic Regression, Naive Bayes, SVM, and XGBoost, and neural network models, such as Fasttext and XLM-Roberta.

Language: Jupyter Notebook - Size: 14.3 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

gerardPlanella/MultiLingual_Stereotypes

Study on stereotype transfer accross Multilingual Language Models for English, Spanish, French, Greek and Croatian. The emotion profiles for different social groups are obtained from pre-trained XLM-RoBERTa and fine-tuned versions of it.

Language: Jupyter Notebook - Size: 9.93 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

StepanTita/news-contest

This repository contains Jupyter notebooks detailing the experiments conducted in our research paper on Ukrainian news classification. We introduce a framework for simple classification dataset creation with minimal labeling effort, and further compare several pretrained models for the Ukrainian language.

Language: Jupyter Notebook - Size: 698 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Jayveersinh-Raj/cross-lingual-zero-shot-transfer

This is cross based project for industrial purposes. This would then be integrated as API, and repository would be made private accordingly

Language: Jupyter Notebook - Size: 5.74 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

honghanhh/sdjt-ate

A Transformer-based Sequence-labeling Approach to the Slovenian Cross-domain Automatic Term Extraction

Language: Python - Size: 114 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

ramachandra742/NLP-notebooks

NLP notebooks

Language: Jupyter Notebook - Size: 7.54 MB - Last synced at: 6 months ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

niloydebbarma-code/LORA-FINETUNING-BANGLASENTI-XLMR-GOOGLE-TPU

First open-source LoRA fine-tuning of BanglaSenti on XLM-RoBERTa-base for Bengali sentiment analysis, trained with Google Cloud TPUs. Includes code, configs, and reproducible results.

Language: Python - Size: 4.79 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

pixiiidust/semantic-ising

Do semantically identical words across languages converge in embedding space? Currently WIP and testing: This tool aims to visualize multilingual embedding alignment under Ising dynamics, revealing latent structure as the system approaches critical temperature. Inspired by the Platonic representation hypothesis.

Language: Python - Size: 548 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

shaitarAn/negations

Master thesis work. Implementation of transfer-learning for cross-lingual zero-shot and few-shot negation scope resolution.

Language: Jupyter Notebook - Size: 24.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

shaitarAn/subword-evenness-crosslingual-transfer

Analysis of subword evenness as a predictor of cross-lingual transfer success in multilingual language models (mBERT, XLM-R, mT5)

Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Cross-Border-E-Commerce-AI/Cross-Cultural-Merchandising-Expert

This research proposes an "AI-Driven Cross-Cultural Commodity Expert" framework, which addresses three critical technical bottlenecks through synergistic innovations in multilingual sentiment analysis, cultural quantification engines, and dynamic knowledge graphs

Language: Python - Size: 4.24 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Soumyo001/sentiment-emotion_detection_on_bengali_product_reviews

Sentiment and emotion detection using mBERT and XLM-R. It comes with a trained model which you can download and test it. Read below for instructions.

Language: Jupyter Notebook - Size: 15.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Jubeerathan/Annaparavai

This repository contains the implementation of a transfer learning-based approach for detecting AI-generated product reviews in Tamil and Malayalam. It includes pretrained model embeddings, deep neural networks, and an ensemble method to enhance classification accuracy.

Language: Jupyter Notebook - Size: 939 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

SimReale/tweets_sexism_detector

NLP assignments about creating a sexism classifier with different approaches.

Language: Jupyter Notebook - Size: 155 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

joyou159/SWIZT Fork of MohamedAlaaAli/SWIZT

Exploring the use of multilingual transformers, specifically mBERT and XLM-RoBERTa, for named entity recognition (NER) in the context of Switzerland’s multi lingual environment.

Language: Jupyter Notebook - Size: 1.35 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Luxshan2000/TamMalKavacham

TamMalKavacham is an open-source tool for detecting abusive content in Tamil and Malayalam, focused on harmful language targeting women. Developed as part of the DravidianLangTech@NAACL 2025 shared task, it uses NLP and machine learning for accurate text classification and content analysis.

Language: Jupyter Notebook - Size: 396 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

fatemafaria142/BanglaCalamityMMD-A-Comprehensive-Benchmark-Dataset-for-Multimodal-Disaster-Identification Fork of Mukaffi28/BanglaCalamityMMD-A-Comprehensive-Benchmark-Dataset-for-Multimodal-Disaster-Identification

This study presents a novel multimodal fusion technique for disaster identification in Bangla, combining text and image data using the "BanglaCalamityMMD" dataset. Employing DisasterTextNet, DisasterImageNet, and DisasterMultFusionNet, the approach addresses a key gap in Bangla disaster research.

Language: Jupyter Notebook - Size: 290 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0