GitHub topics: dialect-identification

Repositories

CAMeL-Lab/camel_tools

A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

Language: Python - Size: 11.5 MB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 482 - Forks: 77

instadeepai/tunbert

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)

Language: Python - Size: 165 KB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 121 - Forks: 38

sinaahmadi/CORDI

Language and Speech Technology for Central Kurdish Varieties (LREC-COLING 2024)

Language: Python - Size: 25.9 MB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 11 - Forks: 2

sinaahmadi/teshi

An atlas of Central Kurdish dialects + a simple game to detect dialects

Language: HTML - Size: 1.83 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

qcri/Arabic_speech_code_switching

The first Dialectal Arabic Code Switching - DACS corpus from broadcast speech. Annotated at the token-level, considering both the linguistic and the acoustic cues. This dataset is a potential benchmark for DCS in spontaneous speech.

Size: 261 MB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 1

abdelrahman-wael/Arabic-Dialect-Classification-Nadi-Shared-Task

using AraBert to classify different Arabic dialects. ranked fourth in WANLP2020 workshop.

Language: Python - Size: 7.15 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

disooqi/farspeech-website

Web interface for far-speech demo to be present in INTERSPEECH 2019

Language: JavaScript - Size: 9.73 MB - Last synced at: 6 months ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

hb20007/greek-dialect-classifier

Classifier that identifies Greek text as Cypriot Greek or Standard Modern Greek

Language: Jupyter Notebook - Size: 1.05 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 3

iabufarha/ArSarcasm

This repository contains the Arabic sarcasm dataset (ArSarcasm)

Size: 932 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 22 - Forks: 14

iabufarha/ArSarcasm-v2

ArSarcasm-v2 is an extension to the original ArSarcasm dataset. It was used for the shared task on sarcasm detection and sentiment analysis, which is a part of WANLP 2021.

Size: 1.26 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 3

eesanoble/Arabic-Dialect-Classifier

An Arabic Tweet Dialect Classifier

Language: Jupyter Notebook - Size: 16.7 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

swshon/dialectID_siam

Dialect identification using Siamese network

Language: Jupyter Notebook - Size: 116 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 15 - Forks: 4

a-coles/SMS-Stylometry

A tool that predicts the dialect of English of an SMS message using recurrent neural networks supplemented with data from Google Trends.

Language: Python - Size: 25.3 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 2

GLaDO8/IViE_corpus_british_dialects_classification

log MFSC based classification of British English dialects from the IViE(Intonational Variation in English) corpus dataset

Language: Python - Size: 26.4 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 1

MohamedSebaie/Arabic_Dialect_Identification_NLP-AIM-Task

Arabic_Dialect_Identification_NLP-AIM-Task

Language: Jupyter Notebook - Size: 27 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 2

Salma-Jamal/Arabic-Dialect-Identification

Arabic Dialect Identification on NADI 2020 and QADI datasets

Language: Jupyter Notebook - Size: 1.31 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

giacomocamposampiero/italian-dialects-identification

ITDI shared task @ VarDial2022 9th Workshop on NLP for Similar Languages, Varieties and Dialects.

Language: Jupyter Notebook - Size: 82.6 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

hasanhuz/Location_Analysis_Project

Language: Python - Size: 8.79 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

kscanne/canuint

Ríomhchlár a dhéanann aicmiú staitistiúil ar théacsanna Gaeilge de réir a gcanúint

Language: Perl - Size: 37.1 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

AlexYangLi/DMT

VarDial19 shared task: Discriminating between Mainland and Taiwan Variation of Mandarin Chinese (DMT)

Language: Python - Size: 2.84 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 1

disooqi/MADAR-shared-task

This shared task will be the first to target a large set of dialect labels at the city and country levels. The data for the shared task is created or collected under the Multi-Arabic Dialect Applications and Resources (MADAR) project.

Language: Jupyter Notebook - Size: 28.8 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

30stomercury/IS19_ComParE_Sub-Challenge

[Interspeech19] Computational Paralinguistics ChallengE (ComParE)

Language: Python - Size: 44.9 KB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

motazsaad/ArbDialectID

Arabic Dialects Identification

Size: 13.6 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

telsahy/capstone-34

Twitter Dialect Datasets and Classifiers (EG Arabic Corpus)

Language: Jupyter Notebook - Size: 7.85 MB - Last synced at: 15 days ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 2

telsahy/capstone-35

Twitter Dialect Datasets and Classifiers (GULF Arabic Corpus)

Language: Jupyter Notebook - Size: 7.69 MB - Last synced at: 15 days ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 1

telsahy/capstone-52

Twitter Dialect Datasets and Classifiers (EG + GULF Arabic Corpus)

Language: Jupyter Notebook - Size: 8.19 MB - Last synced at: 15 days ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 1

Related Keywords

dialect-identification 26 arabic-nlp 9 nlp 9 arabic 7 nlp-machine-learning 6 dialect 4 sentiment-analysis 4 twitter-api 3 topic-modeling 3 machine-learning 3 language-identification 3 farasa 2 classification 2 classifier 2 dialects 2 kurdish-language-processing 2 kurdish 2 arabic-dialects 2 sarcasm-detection 2 mgb 1 audio-processing 1 stylometry 1 character 1 i-vector 1 sms 1 rnn 1 location-detection 1 google-trends 1 language-recognition 1 authorship-identification 1 words 1 siamese-network 1 siamese 1 phoneme 1 mgbchallenge 1 identification 1 morphological-analysis 1 arabic-language 1 interspeech2019 1 shared-task 1 madar 1 2019 1 mandarin-chinese 1 mandarin 1 irish 1 gaeilge 1 word2vec 1 twitter 1 location-analysis 1 location 1 geopy 1 vardial 1 ensemble-learning 1 preprocessing 1 linearsvc 1 bert-fine-tuning 1 arabert 1 lexical 1 evaluation 1 egyptian 1 codeswitching 1 asr 1 acoustic 1 sulaymaniyah 1 sorani 1 sanandaj 1 mahabad 1 machine-translation 1 nlp-apis 1 erbil 1 automatic-speech-recognition 1 question-answering 1 bert-models 1 stemming 1 nlp-library 1 pos-tagging 1 natural-language-processing 1 notebook 1 nltk3 1 nltk-library 1 nltk-data 1 nltk 1 n-grams 1 morphological-disambiguation 1 language-classification 1 jupyter-notebook 1 jupyter 1 greek 1 morphological-generation 1 cypriot 1 speech-recognition 1 qats 1 morphological-reinflection 1 named-entity-recognition 1 mordern-standard-arabic 1