GitHub topics: dialect-identification
CAMeL-Lab/camel_tools
A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
Language: Python - Size: 11.5 MB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 482 - Forks: 77

instadeepai/tunbert
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)
Language: Python - Size: 165 KB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 121 - Forks: 38

sinaahmadi/CORDI
Language and Speech Technology for Central Kurdish Varieties (LREC-COLING 2024)
Language: Python - Size: 25.9 MB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 11 - Forks: 2

sinaahmadi/teshi
An atlas of Central Kurdish dialects + a simple game to detect dialects
Language: HTML - Size: 1.83 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

qcri/Arabic_speech_code_switching
The first Dialectal Arabic Code Switching - DACS corpus from broadcast speech. Annotated at the token-level, considering both the linguistic and the acoustic cues. This dataset is a potential benchmark for DCS in spontaneous speech.
Size: 261 MB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 1

abdelrahman-wael/Arabic-Dialect-Classification-Nadi-Shared-Task
using AraBert to classify different Arabic dialects. ranked fourth in WANLP2020 workshop.
Language: Python - Size: 7.15 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

disooqi/farspeech-website
Web interface for far-speech demo to be present in INTERSPEECH 2019
Language: JavaScript - Size: 9.73 MB - Last synced at: 6 months ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

hb20007/greek-dialect-classifier
Classifier that identifies Greek text as Cypriot Greek or Standard Modern Greek
Language: Jupyter Notebook - Size: 1.05 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 3

iabufarha/ArSarcasm
This repository contains the Arabic sarcasm dataset (ArSarcasm)
Size: 932 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 22 - Forks: 14

iabufarha/ArSarcasm-v2
ArSarcasm-v2 is an extension to the original ArSarcasm dataset. It was used for the shared task on sarcasm detection and sentiment analysis, which is a part of WANLP 2021.
Size: 1.26 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 3

eesanoble/Arabic-Dialect-Classifier
An Arabic Tweet Dialect Classifier
Language: Jupyter Notebook - Size: 16.7 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

swshon/dialectID_siam
Dialect identification using Siamese network
Language: Jupyter Notebook - Size: 116 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 15 - Forks: 4

a-coles/SMS-Stylometry
A tool that predicts the dialect of English of an SMS message using recurrent neural networks supplemented with data from Google Trends.
Language: Python - Size: 25.3 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 2

GLaDO8/IViE_corpus_british_dialects_classification
log MFSC based classification of British English dialects from the IViE(Intonational Variation in English) corpus dataset
Language: Python - Size: 26.4 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 1

MohamedSebaie/Arabic_Dialect_Identification_NLP-AIM-Task
Arabic_Dialect_Identification_NLP-AIM-Task
Language: Jupyter Notebook - Size: 27 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 2

Salma-Jamal/Arabic-Dialect-Identification
Arabic Dialect Identification on NADI 2020 and QADI datasets
Language: Jupyter Notebook - Size: 1.31 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

giacomocamposampiero/italian-dialects-identification
ITDI shared task @ VarDial2022 9th Workshop on NLP for Similar Languages, Varieties and Dialects.
Language: Jupyter Notebook - Size: 82.6 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

hasanhuz/Location_Analysis_Project
Language: Python - Size: 8.79 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

kscanne/canuint
Ríomhchlár a dhéanann aicmiú staitistiúil ar théacsanna Gaeilge de réir a gcanúint
Language: Perl - Size: 37.1 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

AlexYangLi/DMT
VarDial19 shared task: Discriminating between Mainland and Taiwan Variation of Mandarin Chinese (DMT)
Language: Python - Size: 2.84 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 1

disooqi/MADAR-shared-task
This shared task will be the first to target a large set of dialect labels at the city and country levels. The data for the shared task is created or collected under the Multi-Arabic Dialect Applications and Resources (MADAR) project.
Language: Jupyter Notebook - Size: 28.8 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

30stomercury/IS19_ComParE_Sub-Challenge
[Interspeech19] Computational Paralinguistics ChallengE (ComParE)
Language: Python - Size: 44.9 KB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

motazsaad/ArbDialectID
Arabic Dialects Identification
Size: 13.6 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

telsahy/capstone-34
Twitter Dialect Datasets and Classifiers (EG Arabic Corpus)
Language: Jupyter Notebook - Size: 7.85 MB - Last synced at: 15 days ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 2

telsahy/capstone-35
Twitter Dialect Datasets and Classifiers (GULF Arabic Corpus)
Language: Jupyter Notebook - Size: 7.69 MB - Last synced at: 15 days ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 1

telsahy/capstone-52
Twitter Dialect Datasets and Classifiers (EG + GULF Arabic Corpus)
Language: Jupyter Notebook - Size: 8.19 MB - Last synced at: 15 days ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 1
