An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: dialect-identification

CAMeL-Lab/camel_tools

A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

Language: Python - Size: 11.5 MB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 482 - Forks: 77

instadeepai/tunbert

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)

Language: Python - Size: 165 KB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 121 - Forks: 38

sinaahmadi/CORDI

Language and Speech Technology for Central Kurdish Varieties (LREC-COLING 2024)

Language: Python - Size: 25.9 MB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 11 - Forks: 2

sinaahmadi/teshi

An atlas of Central Kurdish dialects + a simple game to detect dialects

Language: HTML - Size: 1.83 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

qcri/Arabic_speech_code_switching

The first Dialectal Arabic Code Switching - DACS corpus from broadcast speech. Annotated at the token-level, considering both the linguistic and the acoustic cues. This dataset is a potential benchmark for DCS in spontaneous speech.

Size: 261 MB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 1

abdelrahman-wael/Arabic-Dialect-Classification-Nadi-Shared-Task

using AraBert to classify different Arabic dialects. ranked fourth in WANLP2020 workshop.

Language: Python - Size: 7.15 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

disooqi/farspeech-website

Web interface for far-speech demo to be present in INTERSPEECH 2019

Language: JavaScript - Size: 9.73 MB - Last synced at: 6 months ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

hb20007/greek-dialect-classifier

Classifier that identifies Greek text as Cypriot Greek or Standard Modern Greek

Language: Jupyter Notebook - Size: 1.05 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 3

iabufarha/ArSarcasm

This repository contains the Arabic sarcasm dataset (ArSarcasm)

Size: 932 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 22 - Forks: 14

iabufarha/ArSarcasm-v2

ArSarcasm-v2 is an extension to the original ArSarcasm dataset. It was used for the shared task on sarcasm detection and sentiment analysis, which is a part of WANLP 2021.

Size: 1.26 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 3

eesanoble/Arabic-Dialect-Classifier

An Arabic Tweet Dialect Classifier

Language: Jupyter Notebook - Size: 16.7 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

swshon/dialectID_siam

Dialect identification using Siamese network

Language: Jupyter Notebook - Size: 116 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 15 - Forks: 4

a-coles/SMS-Stylometry

A tool that predicts the dialect of English of an SMS message using recurrent neural networks supplemented with data from Google Trends.

Language: Python - Size: 25.3 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 2

GLaDO8/IViE_corpus_british_dialects_classification

log MFSC based classification of British English dialects from the IViE(Intonational Variation in English) corpus dataset

Language: Python - Size: 26.4 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 1

MohamedSebaie/Arabic_Dialect_Identification_NLP-AIM-Task

Arabic_Dialect_Identification_NLP-AIM-Task

Language: Jupyter Notebook - Size: 27 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 2

Salma-Jamal/Arabic-Dialect-Identification

Arabic Dialect Identification on NADI 2020 and QADI datasets

Language: Jupyter Notebook - Size: 1.31 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

giacomocamposampiero/italian-dialects-identification

ITDI shared task @ VarDial2022 9th Workshop on NLP for Similar Languages, Varieties and Dialects.

Language: Jupyter Notebook - Size: 82.6 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

hasanhuz/Location_Analysis_Project

Language: Python - Size: 8.79 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

kscanne/canuint

Ríomhchlár a dhéanann aicmiú staitistiúil ar théacsanna Gaeilge de réir a gcanúint

Language: Perl - Size: 37.1 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 1

AlexYangLi/DMT

VarDial19 shared task: Discriminating between Mainland and Taiwan Variation of Mandarin Chinese (DMT)

Language: Python - Size: 2.84 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 1

disooqi/MADAR-shared-task

This shared task will be the first to target a large set of dialect labels at the city and country levels. The data for the shared task is created or collected under the Multi-Arabic Dialect Applications and Resources (MADAR) project.

Language: Jupyter Notebook - Size: 28.8 MB - Last synced at: over 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

30stomercury/IS19_ComParE_Sub-Challenge

[Interspeech19] Computational Paralinguistics ChallengE (ComParE)

Language: Python - Size: 44.9 KB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

motazsaad/ArbDialectID

Arabic Dialects Identification

Size: 13.6 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

telsahy/capstone-34

Twitter Dialect Datasets and Classifiers (EG Arabic Corpus)

Language: Jupyter Notebook - Size: 7.85 MB - Last synced at: 15 days ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 2

telsahy/capstone-35

Twitter Dialect Datasets and Classifiers (GULF Arabic Corpus)

Language: Jupyter Notebook - Size: 7.69 MB - Last synced at: 15 days ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 1

telsahy/capstone-52

Twitter Dialect Datasets and Classifiers (EG + GULF Arabic Corpus)

Language: Jupyter Notebook - Size: 8.19 MB - Last synced at: 15 days ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 1