GitHub topics: code-mixing
microsoft/LID-tool
This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The text that includes words from two languages such as Hindi written in roman script, mixed with English.
langage: Python - taille: 2,16 Mo - dernière synchronisation: il y a environ 17 heures - enregistré: il y a presque 5 ans - étoiles: 55 - forks: 10

andrianllmm/tagLID
A word-level Language Identification (LID) tool for Tagalog-English (Taglish) text
langage: Python - taille: 613 ko - dernière synchronisation: il y a environ un mois - enregistré: il y a environ un mois - étoiles: 2 - forks: 0

gentaiscool/code-switching-papers
A curated list of research papers and resources on code-switching
taille: 178 ko - dernière synchronisation: il y a environ 2 mois - enregistré: il y a 7 mois - étoiles: 315 - forks: 39

microsoft/CodeMixed-Text-Generator
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
langage: Jupyter Notebook - taille: 3,79 Mo - dernière synchronisation: il y a environ 17 heures - enregistré: il y a 12 mois - étoiles: 55 - forks: 11

praatibhsurana/Hinglish_Hindi_WSD
A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanagari script.
langage: Python - taille: 895 ko - dernière synchronisation: il y a 2 jours - enregistré: il y a plus d'un an - étoiles: 36 - forks: 8

jessicasaikia/hidden-markov-model-HMM
This repository implements a Hidden Markov Model (HMM) for performing Parts of Speech (POS) Tagging on Assamese-English code-mixed texts.
langage: Python - taille: 358 ko - dernière synchronisation: il y a environ 2 mois - enregistré: il y a 8 mois - étoiles: 0 - forks: 0

jessicasaikia/conditional-random-field-CRF
This repository implements a Conditional Random Field (CRF) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
langage: Python - taille: 10,7 ko - dernière synchronisation: il y a 4 mois - enregistré: il y a 8 mois - étoiles: 0 - forks: 0

jessicasaikia/long-short-term-memory-LSTM
This repository implements a Long Short Term Memory (LSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
langage: Python - taille: 16,6 ko - dernière synchronisation: il y a 4 jours - enregistré: il y a 8 mois - étoiles: 0 - forks: 0

jessicasaikia/bidirectional-long-short-term-memory-BiLSTM
This repository implements a Bidirectional Long Short Term Memory (BiLSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
langage: Python - taille: 11,7 ko - dernière synchronisation: il y a 4 mois - enregistré: il y a 8 mois - étoiles: 0 - forks: 0

jessicasaikia/multilingual-BERT-mBERT
This repository implements a Multilingual BERT (mBERT) model for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
langage: Python - taille: 11,7 ko - dernière synchronisation: il y a 4 mois - enregistré: il y a 8 mois - étoiles: 0 - forks: 0

jessicasaikia/rule-based
This repository contains a simple Rule-Based Model for Parts-of-Speech tagging in Assamese-English code mixed texts.
langage: Python - taille: 352 ko - dernière synchronisation: il y a 8 jours - enregistré: il y a 8 mois - étoiles: 0 - forks: 0

salesforce/adversarial-polyglots
Code for the paper "Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots" (NAACL-HLT 2021)
langage: Python - taille: 45,9 ko - dernière synchronisation: il y a 3 mois - enregistré: il y a plus de 3 ans - étoiles: 10 - forks: 7

Wei-RongRong2/RojakLanguageSentimentAnalysis
This is a machine learning project focused on analysing and classifying sentiments in code-switched and code-mixed text, specifically targeting the unique linguistic characteristics found in Malaysian conversations.
langage: Jupyter Notebook - taille: 20,6 Mo - dernière synchronisation: il y a 4 mois - enregistré: il y a 12 mois - étoiles: 0 - forks: 0

Nexdata-AI/300-Person-Mandarin-Chinese-and-English-Bilingual-Spontaneous-Monologue-smartphone
300-Person-Mandarin-Chinese-and-English-Bilingual-Spontaneous-Monologue-smartphone
taille: 2,93 ko - dernière synchronisation: il y a 12 mois - enregistré: il y a 12 mois - étoiles: 0 - forks: 0

cisnlp/MaskLID
MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024
langage: Python - taille: 12,7 ko - dernière synchronisation: il y a environ un an - enregistré: il y a environ un an - étoiles: 2 - forks: 0

vcyrot/Frenglish-Benchmark
A Centralized Frenglish Benchmark from Naturally Occurring Code-Switching and Code-Mixing
taille: 105 ko - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 2 ans - étoiles: 0 - forks: 0

Lidan0241/language-detection
language detection in code-switching for es/en/zh speakers
langage: Jupyter Notebook - taille: 4,6 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ un an - étoiles: 1 - forks: 0

Bernardbyy/BahasaRojakSentimentAnalysis
Handling Bahasa Rojak (Malaysian Code Mixing Language) OOV and performing Sentiment Analysis using downstreamed XLM-R
langage: Jupyter Notebook - taille: 2,88 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ un an - étoiles: 0 - forks: 1

gulabpatel/Code-Mixing
will discuss code mixing algorithms evolution
langage: Jupyter Notebook - taille: 204 ko - dernière synchronisation: il y a environ 2 mois - enregistré: il y a presque 3 ans - étoiles: 2 - forks: 0

ir-nlp-csui/id-en-code-mixed
Indonesian-English code-mixed Twitter dataset
taille: 288 ko - dernière synchronisation: il y a plus d'un an - enregistré: il y a presque 3 ans - étoiles: 1 - forks: 0

Anwarvic/truel_bilingual_nmt
The official code for the "True Bilingual NMT" paper
langage: Python - taille: 3,59 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 3 ans - étoiles: 0 - forks: 0

LCS2-IIITD/HIT-ACL2021-Codemixed-Representation
This repo contains the source code of HIT: A Hierarchically Fused Deep Attention Network for RobustCode-mixed Language Representation (Accepted in ACL 2021)
langage: Python - taille: 29,2 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 3 ans - étoiles: 6 - forks: 5

carexl8/code-mixed-tweets
Tweet ids for code-mixed Russian-German and Russian-Hebrew tweets
taille: 20,5 ko - dernière synchronisation: il y a environ 2 ans - enregistré: il y a environ 2 ans - étoiles: 0 - forks: 0

ash-shar/Code-Switching-and-Swearing-Patterns-on-Twitter
Repository containing Abusive Tweet Detection, Location Detection and Gender Detection codes
langage: Python - taille: 1,97 Mo - dernière synchronisation: il y a presque 2 ans - enregistré: il y a plus de 7 ans - étoiles: 6 - forks: 2

mmaguero/josa-corpus
Jopara (Guarani-dominant mixed with Spanish) sentiment analysis corpus
taille: 8,79 ko - dernière synchronisation: il y a plus de 2 ans - enregistré: il y a environ 3 ans - étoiles: 6 - forks: 0

aparnadutta/code-mixed-lid
Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.
langage: Python - taille: 190 Mo - dernière synchronisation: il y a plus de 2 ans - enregistré: il y a environ 3 ans - étoiles: 5 - forks: 0

sumanbanerjee1/Code-Mixed-Dialog
langage: Python - taille: 13,1 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ 7 ans - étoiles: 33 - forks: 7

ayanc18/PsycholinguisticCodeMixing
Psycholinguistic Analysis of Code Mixing - Speech and Natural Language Processing Term Project: CS60057. Department of Computer science and Engineering, Indian Institute of Technology Kharagpur
langage: Python - taille: 2,88 Mo - dernière synchronisation: il y a environ 2 ans - enregistré: il y a plus de 7 ans - étoiles: 1 - forks: 1

poornagurram/code_mixing_sentiment
langage: Python - taille: 2,83 Mo - dernière synchronisation: il y a environ 2 ans - enregistré: il y a environ 7 ans - étoiles: 1 - forks: 1

kmi-linguistics/Code-mixing
taille: 3,91 ko - dernière synchronisation: il y a environ 2 ans - enregistré: il y a plus de 7 ans - étoiles: 0 - forks: 0
