Topic: "code-mixing"
gentaiscool/code-switching-papers
A curated list of research papers and resources on code-switching
Size: 178 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 304 - Forks: 38

microsoft/CodeMixed-Text-Generator
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 54 - Forks: 12

microsoft/LID-tool
This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The text that includes words from two languages such as Hindi written in roman script, mixed with English.
Language: Python - Size: 2.16 MB - Last synced at: 1 day ago - Pushed at: over 4 years ago - Stars: 54 - Forks: 9

praatibhsurana/Hinglish_Hindi_WSD
A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanagari script.
Language: Python - Size: 895 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 35 - Forks: 7

sumanbanerjee1/Code-Mixed-Dialog
Language: Python - Size: 13.1 MB - Last synced at: 9 months ago - Pushed at: almost 7 years ago - Stars: 33 - Forks: 7

salesforce/adversarial-polyglots
Code for the paper "Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots" (NAACL-HLT 2021)
Language: Python - Size: 45.9 KB - Last synced at: 5 days ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 7

mmaguero/josa-corpus
Jopara (Guarani-dominant mixed with Spanish) sentiment analysis corpus
Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 6 - Forks: 0

LCS2-IIITD/HIT-ACL2021-Codemixed-Representation
This repo contains the source code of HIT: A Hierarchically Fused Deep Attention Network for RobustCode-mixed Language Representation (Accepted in ACL 2021)
Language: Python - Size: 29.2 MB - Last synced at: 11 months ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 5

ash-shar/Code-Switching-and-Swearing-Patterns-on-Twitter
Repository containing Abusive Tweet Detection, Location Detection and Gender Detection codes
Language: Python - Size: 1.97 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 2

aparnadutta/code-mixed-lid
Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.
Language: Python - Size: 190 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 0

cisnlp/MaskLID
MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024
Language: Python - Size: 12.7 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

andrianllmm/tagLID
A word level Language Identification (LID) tool for Tagalog-English (Taglish) text.
Language: Python - Size: 610 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Lidan0241/language-detection
language detection in code-switching for es/en/zh speakers
Language: Jupyter Notebook - Size: 4.6 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

ir-nlp-csui/id-en-code-mixed
Indonesian-English code-mixed Twitter dataset
Size: 288 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

poornagurram/code_mixing_sentiment
Language: Python - Size: 2.83 MB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

ayanc18/PsycholinguisticCodeMixing
Psycholinguistic Analysis of Code Mixing - Speech and Natural Language Processing Term Project: CS60057. Department of Computer science and Engineering, Indian Institute of Technology Kharagpur
Language: Python - Size: 2.88 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

jessicasaikia/hidden-markov-model-HMM
This repository implements a Hidden Markov Model (HMM) for performing Parts of Speech (POS) Tagging on Assamese-English code-mixed texts.
Language: Python - Size: 358 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jessicasaikia/conditional-random-field-CRF
This repository implements a Conditional Random Field (CRF) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
Language: Python - Size: 10.7 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jessicasaikia/long-short-term-memory-LSTM
This repository implements a Long Short Term Memory (LSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
Language: Python - Size: 16.6 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jessicasaikia/bidirectional-long-short-term-memory-BiLSTM
This repository implements a Bidirectional Long Short Term Memory (BiLSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
Language: Python - Size: 11.7 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jessicasaikia/multilingual-BERT-mBERT
This repository implements a Multilingual BERT (mBERT) model for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.
Language: Python - Size: 11.7 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jessicasaikia/rule-based
This repository contains a simple Rule-Based Model for Parts-of-Speech tagging in Assamese-English code mixed texts.
Language: Python - Size: 352 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Wei-RongRong2/RojakLanguageSentimentAnalysis
This is a machine learning project focused on analysing and classifying sentiments in code-switched and code-mixed text, specifically targeting the unique linguistic characteristics found in Malaysian conversations.
Language: Jupyter Notebook - Size: 20.6 MB - Last synced at: 15 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Nexdata-AI/300-Person-Mandarin-Chinese-and-English-Bilingual-Spontaneous-Monologue-smartphone
300-Person-Mandarin-Chinese-and-English-Bilingual-Spontaneous-Monologue-smartphone
Size: 2.93 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Bernardbyy/BahasaRojakSentimentAnalysis
Handling Bahasa Rojak (Malaysian Code Mixing Language) OOV and performing Sentiment Analysis using downstreamed XLM-R
Language: Jupyter Notebook - Size: 2.88 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 1

carexl8/code-mixed-tweets
Tweet ids for code-mixed Russian-German and Russian-Hebrew tweets
Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

vcyrot/Frenglish-Benchmark
A Centralized Frenglish Benchmark from Naturally Occurring Code-Switching and Code-Mixing
Size: 105 KB - Last synced at: 11 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

gulabpatel/Code-Mixing
will discuss code mixing algorithms evolution
Language: Jupyter Notebook - Size: 204 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Anwarvic/truel_bilingual_nmt
The official code for the "True Bilingual NMT" paper
Language: Python - Size: 3.59 MB - Last synced at: 12 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

kmi-linguistics/Code-mixing
Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0
