An open API service providing repository metadata for many open source software ecosystems.

Topic: "code-mixing"

gentaiscool/code-switching-papers

A curated list of research papers and resources on code-switching

Size: 178 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 304 - Forks: 38

microsoft/CodeMixed-Text-Generator

This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.

Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 54 - Forks: 12

microsoft/LID-tool

This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The text that includes words from two languages such as Hindi written in roman script, mixed with English.

Language: Python - Size: 2.16 MB - Last synced at: 1 day ago - Pushed at: over 4 years ago - Stars: 54 - Forks: 9

praatibhsurana/Hinglish_Hindi_WSD

A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanagari script.

Language: Python - Size: 895 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 35 - Forks: 7

sumanbanerjee1/Code-Mixed-Dialog

Language: Python - Size: 13.1 MB - Last synced at: 9 months ago - Pushed at: almost 7 years ago - Stars: 33 - Forks: 7

salesforce/adversarial-polyglots

Code for the paper "Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots" (NAACL-HLT 2021)

Language: Python - Size: 45.9 KB - Last synced at: 5 days ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 7

mmaguero/josa-corpus

Jopara (Guarani-dominant mixed with Spanish) sentiment analysis corpus

Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 6 - Forks: 0

LCS2-IIITD/HIT-ACL2021-Codemixed-Representation

This repo contains the source code of HIT: A Hierarchically Fused Deep Attention Network for RobustCode-mixed Language Representation (Accepted in ACL 2021)

Language: Python - Size: 29.2 MB - Last synced at: 11 months ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 5

ash-shar/Code-Switching-and-Swearing-Patterns-on-Twitter

Repository containing Abusive Tweet Detection, Location Detection and Gender Detection codes

Language: Python - Size: 1.97 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 2

aparnadutta/code-mixed-lid

Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.

Language: Python - Size: 190 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 0

cisnlp/MaskLID

MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024

Language: Python - Size: 12.7 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

andrianllmm/tagLID

A word level Language Identification (LID) tool for Tagalog-English (Taglish) text.

Language: Python - Size: 610 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

Lidan0241/language-detection

language detection in code-switching for es/en/zh speakers

Language: Jupyter Notebook - Size: 4.6 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

ir-nlp-csui/id-en-code-mixed

Indonesian-English code-mixed Twitter dataset

Size: 288 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

poornagurram/code_mixing_sentiment

Language: Python - Size: 2.83 MB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

ayanc18/PsycholinguisticCodeMixing

Psycholinguistic Analysis of Code Mixing - Speech and Natural Language Processing Term Project: CS60057. Department of Computer science and Engineering, Indian Institute of Technology Kharagpur

Language: Python - Size: 2.88 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 1

jessicasaikia/hidden-markov-model-HMM

This repository implements a Hidden Markov Model (HMM) for performing Parts of Speech (POS) Tagging on Assamese-English code-mixed texts.

Language: Python - Size: 358 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jessicasaikia/conditional-random-field-CRF

This repository implements a Conditional Random Field (CRF) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.

Language: Python - Size: 10.7 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jessicasaikia/long-short-term-memory-LSTM

This repository implements a Long Short Term Memory (LSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.

Language: Python - Size: 16.6 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jessicasaikia/bidirectional-long-short-term-memory-BiLSTM

This repository implements a Bidirectional Long Short Term Memory (BiLSTM) for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.

Language: Python - Size: 11.7 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jessicasaikia/multilingual-BERT-mBERT

This repository implements a Multilingual BERT (mBERT) model for performing Parts-of-Speech (POS) Tagging on Assamese-English code-mixed texts.

Language: Python - Size: 11.7 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jessicasaikia/rule-based

This repository contains a simple Rule-Based Model for Parts-of-Speech tagging in Assamese-English code mixed texts.

Language: Python - Size: 352 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Wei-RongRong2/RojakLanguageSentimentAnalysis

This is a machine learning project focused on analysing and classifying sentiments in code-switched and code-mixed text, specifically targeting the unique linguistic characteristics found in Malaysian conversations.

Language: Jupyter Notebook - Size: 20.6 MB - Last synced at: 15 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Nexdata-AI/300-Person-Mandarin-Chinese-and-English-Bilingual-Spontaneous-Monologue-smartphone

300-Person-Mandarin-Chinese-and-English-Bilingual-Spontaneous-Monologue-smartphone

Size: 2.93 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Bernardbyy/BahasaRojakSentimentAnalysis

Handling Bahasa Rojak (Malaysian Code Mixing Language) OOV and performing Sentiment Analysis using downstreamed XLM-R

Language: Jupyter Notebook - Size: 2.88 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 1

carexl8/code-mixed-tweets

Tweet ids for code-mixed Russian-German and Russian-Hebrew tweets

Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

vcyrot/Frenglish-Benchmark

A Centralized Frenglish Benchmark from Naturally Occurring Code-Switching and Code-Mixing

Size: 105 KB - Last synced at: 11 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

gulabpatel/Code-Mixing

will discuss code mixing algorithms evolution

Language: Jupyter Notebook - Size: 204 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Anwarvic/truel_bilingual_nmt

The official code for the "True Bilingual NMT" paper

Language: Python - Size: 3.59 MB - Last synced at: 12 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

kmi-linguistics/Code-mixing

Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

Related Topics
nlp 17 code-switching 13 code-mixed 8 english 8 pos-tagging 7 nlp-machine-learning 7 parts-of-speech 7 english-language 7 assamese 7 language-identification 6 pos-tagger 6 assamese-text 6 natural-language-processing 5 parts-of-speech-tagging 5 sentiment-analysis 4 twitter 4 linguistics 3 sentiment-classification 3 python3 3 bilstm 2 deep-learning 2 bilingual 2 multilingual 2 lstm 2 machine-learning 2 code-switch 2 machine-translation 2 named-entity-recognition 2 transformer 2 crfsuite 1 lesk-algorithm 1 lesk 1 indowordnet 1 research 1 indic-transliteration 1 indic-nlp 1 indic-languages 1 hinglish-to-hindi-transliteration 1 speech 1 hred 1 seq2seq 1 hinglish 1 hindi-spell-correction 1 tagalog 1 hindi-pos-tag 1 spontaneous-speech-recognition 1 speech-to-text 1 taglish 1 indonesian-language 1 asr 1 lexical-normalization 1 mallet 1 identification-language 1 language-tags 1 synthetic-data-generation 1 language-modeling 1 data-generation 1 bilstm-model 1 bidirectional-lstm 1 bidirectional-long-short-term-memory-network 1 multilingual-bert 1 mbert 1 hmm-viterbi-algorithm 1 hmm-model 1 hmm 1 hidden-markov-model 1 support-vector-machine 1 render-deployment 1 multinomial-naive-bayes 1 multilingual-nlp 1 malaysian-language 1 malaya-library 1 flask-application 1 docker-image 1 french-english 1 language-identifier 1 language-identification-toolkit 1 wsd-dataset 1 wsd 1 word-sense-disambiguation 1 spello 1 python-package 1 python-library 1 python-3 1 crf-model 1 crf 1 conditional-random-field 1 rule-based-nlp 1 rule-based-modeling 1 rule-based 1 assamese-language 1 assamese-english 1 traditional-machine-learning 1 text-classification 1 text-categorization 1 low-resource-languages 1 corpus-linguistics 1 bert-fine-tuning 1 baselines 1 tweets 1