An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: code-mixing

andrianllmm/tagLID

A word-level Language Identification (LID) tool for Tagalog-English (Taglish) text

Language: Python - Size: 613 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2 - Forks: 0

microsoft/LID-tool

This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The text that includes words from two languages such as Hindi written in roman script, mixed with English.

Language: Python - Size: 2.16 MB - Last synced at: 5 days ago - Pushed at: almost 5 years ago - Stars: 53 - Forks: 10

gentaiscool/code-switching-papers

A curated list of research papers and resources on code-switching

Size: 178 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 315 - Forks: 39

microsoft/CodeMixed-Text-Generator

This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.

Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: 5 days ago - Pushed at: 11 months ago - Stars: 55 - Forks: 12

praatibhsurana/Hinglish_Hindi_WSD

A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanagari script.

Language: Python - Size: 895 KB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 36 - Forks: 8

jessicasaikia/hidden-markov-model-HMM

This repository implements a Hidden Markov Model (HMM) for performing Parts of Speech (POS) Tagging on Assamese-English code-mixed texts.

Language: Python - Size: 358 KB - Last synced at: 26 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

jessicasaikia/conditional-random-field-CRF

This repository implements a Conditional Random Field (CRF) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.

Language: Python - Size: 10.7 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

jessicasaikia/long-short-term-memory-LSTM

This repository implements a Long Short Term Memory (LSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.

Language: Python - Size: 16.6 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

jessicasaikia/bidirectional-long-short-term-memory-BiLSTM

This repository implements a Bidirectional Long Short Term Memory (BiLSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.

Language: Python - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

jessicasaikia/multilingual-BERT-mBERT

This repository implements a Multilingual BERT (mBERT) model for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.

Language: Python - Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

jessicasaikia/rule-based

This repository contains a simple Rule-Based Model for Parts-of-Speech tagging in Assamese-English code mixed texts.

Language: Python - Size: 352 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

salesforce/adversarial-polyglots

Code for the paper "Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots" (NAACL-HLT 2021)

Language: Python - Size: 45.9 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 10 - Forks: 7

Wei-RongRong2/RojakLanguageSentimentAnalysis

This is a machine learning project focused on analysing and classifying sentiments in code-switched and code-mixed text, specifically targeting the unique linguistic characteristics found in Malaysian conversations.

Language: Jupyter Notebook - Size: 20.6 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Nexdata-AI/300-Person-Mandarin-Chinese-and-English-Bilingual-Spontaneous-Monologue-smartphone

300-Person-Mandarin-Chinese-and-English-Bilingual-Spontaneous-Monologue-smartphone

Size: 2.93 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

cisnlp/MaskLID

MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024

Language: Python - Size: 12.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

vcyrot/Frenglish-Benchmark

A Centralized Frenglish Benchmark from Naturally Occurring Code-Switching and Code-Mixing

Size: 105 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Lidan0241/language-detection

language detection in code-switching for es/en/zh speakers

Language: Jupyter Notebook - Size: 4.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Bernardbyy/BahasaRojakSentimentAnalysis

Handling Bahasa Rojak (Malaysian Code Mixing Language) OOV and performing Sentiment Analysis using downstreamed XLM-R

Language: Jupyter Notebook - Size: 2.88 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

gulabpatel/Code-Mixing

will discuss code mixing algorithms evolution

Language: Jupyter Notebook - Size: 204 KB - Last synced at: 26 days ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

ir-nlp-csui/id-en-code-mixed

Indonesian-English code-mixed Twitter dataset

Size: 288 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Anwarvic/truel_bilingual_nmt

The official code for the "True Bilingual NMT" paper

Language: Python - Size: 3.59 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

LCS2-IIITD/HIT-ACL2021-Codemixed-Representation

This repo contains the source code of HIT: A Hierarchically Fused Deep Attention Network for RobustCode-mixed Language Representation (Accepted in ACL 2021)

Language: Python - Size: 29.2 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 5

carexl8/code-mixed-tweets

Tweet ids for code-mixed Russian-German and Russian-Hebrew tweets

Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

ash-shar/Code-Switching-and-Swearing-Patterns-on-Twitter

Repository containing Abusive Tweet Detection, Location Detection and Gender Detection codes

Language: Python - Size: 1.97 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 2

mmaguero/josa-corpus

Jopara (Guarani-dominant mixed with Spanish) sentiment analysis corpus

Size: 8.79 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 0

aparnadutta/code-mixed-lid

Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.

Language: Python - Size: 190 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 0

sumanbanerjee1/Code-Mixed-Dialog

Language: Python - Size: 13.1 MB - Last synced at: 11 months ago - Pushed at: about 7 years ago - Stars: 33 - Forks: 7

ayanc18/PsycholinguisticCodeMixing

Psycholinguistic Analysis of Code Mixing - Speech and Natural Language Processing Term Project: CS60057. Department of Computer science and Engineering, Indian Institute of Technology Kharagpur

Language: Python - Size: 2.88 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

poornagurram/code_mixing_sentiment

Language: Python - Size: 2.83 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

kmi-linguistics/Code-mixing

Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

Related Keywords
code-mixing 30 nlp 17 code-switching 13 english 8 code-mixed 8 parts-of-speech 7 nlp-machine-learning 7 english-language 7 assamese 7 pos-tagging 7 pos-tagger 6 assamese-text 6 language-identification 6 parts-of-speech-tagging 5 natural-language-processing 5 twitter 4 sentiment-analysis 4 python3 3 linguistics 3 sentiment-classification 3 machine-translation 2 lstm 2 transformer 2 bilstm 2 bilingual 2 code-switch 2 machine-learning 2 multilingual 2 deep-learning 2 named-entity-recognition 2 hindi 1 malaysian-language 1 indonesian-language 1 lid 1 xlmroberta 1 multilingual-nlp 1 transfer-learning 1 multinomial-naive-bayes 1 out-of-vocabulary 1 fine-tuning 1 domain-adaptation 1 chinese-simplified 1 bahasa-melayu 1 render-deployment 1 identification-language 1 french-english 1 language-identifier 1 language-identification-toolkit 1 spontaneous-speech-recognition 1 speech-to-text 1 asr 1 support-vector-machine 1 computational-linguistics 1 psycholinguistics 1 seq2seq 1 hred 1 word-level-language-model 1 bangla-nlp 1 traditional-machine-learning 1 text-classification 1 text-categorization 1 low-resource-languages 1 corpus-linguistics 1 bert-fine-tuning 1 baselines 1 swearing 1 social-network-analysis 1 location-detection 1 gender-detection 1 tweets 1 russian 1 hebrew 1 german 1 attention-model 1 indian-language 1 indian-languages 1 language-detection 1 pretrained-models 1 neural-machine-translation 1 multilingual-translations 1 social-media 1 lexical-normalization 1 hidden-markov-model 1 wsd-dataset 1 wsd 1 word-sense-disambiguation 1 spello 1 python-package 1 python-library 1 python-3 1 lesk-algorithm 1 lesk 1 indowordnet 1 indic-transliteration 1 indic-nlp 1 indic-languages 1 hinglish-to-hindi-transliteration 1 hinglish 1 hindi-spell-correction 1 hindi-pos-tag 1