GitHub topics: wav2vec2 | Ecosyste.ms: Repos

RhysonYang-2030/ASACA-Automatic-Speech-Analysis-for-Cognitive-Assessment

The automatic system that can extract PRAAT-like speech features from raw speech wav files, and also can get low WER (0.02) high quality transcriptions at the same time.

Language: Dockerfile - Size: 67.4 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 3 - Forks: 0

Mukuta-Manit-D/AI-Mirror

AI Mirror is a smart, interactive web application that detects human emotions in real time through voice recordings and text inputs using powerful AI models. Whether you're speaking or typing, AI Mirror reflects your emotional state instantly—making it ideal for use cases like mental health tracking, mood journaling, or AI-driven conversation

Language: JavaScript - Size: 475 KB - Last synced at: about 16 hours ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

vectominist/MiniASR

A mini, simple, and fast end-to-end automatic speech recognition toolkit.

Language: Jupyter Notebook - Size: 342 KB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 54 - Forks: 6

PaddlePaddle/PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Language: Python - Size: 69.2 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 12,020 - Forks: 1,922

ASRBench/asrbench-cli

A command-line tool for the ASRBench framework, simplifying audio transcription system benchmarking with a single config file, supporting popular and custom transcription systems

Language: Python - Size: 1.22 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

s3prl/s3prl

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Language: Python - Size: 135 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 2,411 - Forks: 501

VGD3626/English_Accent_Detection Fork of Divyang029/English_Accent_Detection

Audio classification using transfer learning-based approach

Language: Jupyter Notebook - Size: 6.52 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

PRITHIVSAKTHIUR/Common-Voice-Gender-Detection

Speech-Emotion-Classification is a fine-tuned version of facebook/wav2vec2-base-960h for multi-class audio classification, specifically trained to detect emotions in speech. This model utilizes the Wav2Vec2ForSequenceClassification architecture to accurately classify speaker emotions from audio signals.

Language: Python - Size: 11.7 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 1 - Forks: 0

hyunnnchoi/tethys-speech

tf2 implementation of whisper & wav2vec2 models w/ distributed training for k8s/kubeflow

Language: Python - Size: 227 KB - Last synced at: 7 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

audeering/w2v2-how-to

How to use our public wav2vec2 dimensional emotion model

Language: Jupyter Notebook - Size: 98.6 KB - Last synced at: 20 days ago - Pushed at: about 2 years ago - Stars: 502 - Forks: 48

tuanio/noisy-student-training-asr

Pytorch implementation of Noisy Student Training for Automatic Speech Recognition and Automatic Pronunciation Error Detection problem

Language: Python - Size: 3.08 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 93 - Forks: 15

pszemraj/vid2cleantxt

Python API & command-line tool to easily transcribe speech-based video files into clean text

Language: Jupyter Notebook - Size: 723 MB - Last synced at: 29 days ago - Pushed at: 8 months ago - Stars: 213 - Forks: 29

faizaliyaqat/Speech-emotion-recognition

Speech Emotion Recognition using Wav2Vec 2.0 + Random Forest Real-time emotion detection system built with Streamlit, trained on RAVDESS and SAVEE datasets using Wav2Vec 2.0 features and a Random Forest classifier. Includes SHAP explainability and audio waveform visualization.

Language: Python - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

aryanxxvii/lark

Speech Assessment API in FastAPI with HuggingFace 🤗

Language: JavaScript - Size: 184 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

mokshhhhh/AudioCaptchaRecognizer

A conversational AI : Speech synthesis project where we develop and use a model to identify audio captcha often seen in websites' human verification.

Language: Python - Size: 13.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

thevasudevgupta/gsoc-wav2vec2

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Language: Jupyter Notebook - Size: 6.67 MB - Last synced at: 7 days ago - Pushed at: over 3 years ago - Stars: 90 - Forks: 29

egorsmkv/speech-to-text-using-php

Use PHP for Speech-to-Text task. Just a research.

Language: PHP - Size: 289 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

lakshiitakalyanasundaram/DeepSonic

DeepFake Audio detection project using Wav2Vec2 for MOMENTA (Task for internship )

Language: Python - Size: 64.3 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

slinusc/speaker_identification_evaluation

Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks

Language: Jupyter Notebook - Size: 8.56 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 3 - Forks: 1

yamahigashi/Wav2Vec2FBX

Recognize speech from an audio file and convert it into animation FBX

Language: Python - Size: 199 KB - Last synced at: 25 days ago - Pushed at: over 3 years ago - Stars: 21 - Forks: 3

khanld/ASR-Wav2vec-Finetune

:zap: Finetune Wa2vec 2.0 For Speech Recognition

Language: Python - Size: 5.1 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 127 - Forks: 28

ttop32/wav2vec2-live-japanese-translator

real time japanese speech recognition translator using wav2vec2

Language: Jupyter Notebook - Size: 926 KB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 38 - Forks: 3

oliverguhr/wav2vec2-live

A live speech recognition using Facebooks wav2vec 2.0 model.

Language: Python - Size: 2.84 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 348 - Forks: 56

vietai/ASR

End-to-End Vietnamese Speech Recognition using wav2vec 2.0

Size: 10.7 KB - Last synced at: 3 months ago - Pushed at: almost 4 years ago - Stars: 98 - Forks: 9

mahshid1378/ASR-Wav2vec-Finetune

⚡ Finetune Wa2vec 2.0 For Speech Recognition

Language: Python - Size: 5.01 MB - Last synced at: 15 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

notAI-tech/IndicASR

Speeech Recognition for Indic languages.

Language: Python - Size: 623 KB - Last synced at: 24 days ago - Pushed at: about 4 years ago - Stars: 13 - Forks: 3

pooya-mohammadi/audio-classification-pytorch

In this project, several approaches for training/finetuning an audio gender recognition is provided. The code can simply be used for any other audio classification task by simply changing the number of classes and the input dataset.

Language: Jupyter Notebook - Size: 871 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 41 - Forks: 4

jmaczan/asr-dysarthria

Research on Automatic Speech Recognition for dysarthric speech

Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 11 - Forks: 2

egorsmkv/w2v2-bert-aligner

Aligner for wav2vec2-bert models

Language: Python - Size: 1.95 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 1

navalnica/wav2vec2-belarusian

Speech to Text model for Belarusian language

Language: Jupyter Notebook - Size: 1.37 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 0

lucasgris/wav2vec4bp

Wav2vec resources and models for Brazilian Portuguese

Language: Jupyter Notebook - Size: 1.65 MB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 33 - Forks: 2

louisbrulenaudet/balena

BALanced Execution through Natural Activation : a human-computer interaction methodology for code running.

Language: Python - Size: 229 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

khanld/Wav2vec2-Pretraining

Wav2vec 2.0 Self-Supervised Pretraining

Language: Python - Size: 303 KB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 4

jp1924/ASR

🤗ASR 학습시키기 위한 코드

Language: Python - Size: 413 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

moncefbenaicha/spoken-ner

Spoken NER implementation based on Wav2Vec2-XLS-R with experiments on transfer learning

Language: Python - Size: 113 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

skit-ai/Map-Mix

The official implementation of the method discussed in the paper Improving Spoken Language Identification with Map-Mix(work accepted at ICASSP-2023)

Size: 18.1 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 1

habla-liaa/ser-with-w2v2

Official implementation of INTERSPEECH 2021 paper 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'

Language: Jupyter Notebook - Size: 32.3 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 128 - Forks: 23

moxeeem/ASR-pronunciation-correction

Этот проект представляет систему автоматической коррекции произношения на английском языке с использованием нейронной сети wav2vec2.

Language: Jupyter Notebook - Size: 9.17 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

Nightey3s/Speech-Emotion-Recognition-using-Wav2Vec2

A Speech Emotion Recognition (SER) system using Facebook's Wav2Vec2 model that classifies speech into four emotions (Neutral, Happy, Sad, Angry). Achieves 69.02% accuracy on IEMOCAP dataset using modern transformer architecture and comprehensive data augmentation techniques.

Language: Jupyter Notebook - Size: 1.06 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

nhut-ngnn/Multimodal-Speech-Emotion-Recognition

A multimodal SER project combining BERT and ECAPA-TDNN with cross-attention-based fusion on the IEMOCAP dataset.

Language: Python - Size: 7.07 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 4 - Forks: 0

thiagogre/mimicking

English Pronunciation Improvement App

Language: Python - Size: 852 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

Not-ML/audio-ml

Standalone Audio ML Application: An innovative Python-based tool integrating Speech Recognition (ASR), Sentiment Analysis (NLP), and Text-to-Speech (TTS) to process audio, analyze sentiment, and generate spoken responses. Features both command-line and GUI interfaces for seamless interaction.

Language: Python - Size: 20.5 KB - Last synced at: 27 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

seanghay/kfa

A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus

Language: Python - Size: 10.1 MB - Last synced at: about 10 hours ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

sugarcane-mk/finetuning_wav2vec2

This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers

Language: Jupyter Notebook - Size: 42 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

aitor-alvarez/acoustic-transformer-models

Acoustic Transformer Models for Audio Classification

Language: Python - Size: 51.8 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

imsanjoykb/Speech-NLP-Bootcamp

Speech NLP Bootcamp

Language: Jupyter Notebook - Size: 3.1 MB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 1

Msparihar/Transcriber

Developed an AI tool to automatically generate captions and transcripts for YouTube videos in 67 languages and can generate summarized texts in 133 languages.

Language: Python - Size: 14.6 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

kingabzpro/WOLOF-ASR-Wav2Vec2

Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.

Language: Jupyter Notebook - Size: 3.34 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 8

FernandoLpz/SpeechRecognition

This repository contains the implementation of an Automatic Speech Recognition system in python, using a client-server architecture with Web Sockets.

Language: Python - Size: 118 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 2

kardSIM/audio2img

Extend the Conditioning of Stable Diffusion to take Audio Embeddings Instead of Text Embeddings using Wav2Vec2-BERT model

Language: Jupyter Notebook - Size: 29.8 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 7 - Forks: 1

aitor-alvarez/large-speech-models

Fine-tuning Multilingual Large Speech Recognition Models: Wav2vec and Whisper

Language: Python - Size: 84 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

dangrebenkin/wav2vec2_speech_markuper

Automatic generation of speech dataset markup using Wav2Vec2 ASR models

Language: Python - Size: 396 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

piedeboer96/Digital-Assistant-Audio-Processing

Project 2.2 - Speech Recognition and Speaker Identification

Language: Java - Size: 13 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

gulabpatel/Speech-to-Text

Language: Jupyter Notebook - Size: 1.64 MB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 3 - Forks: 0

egorsmkv/wav2vec2-hidet

A test to run w2v2 with hidet optimizer

Language: Python - Size: 402 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

parvatijay2901/Hindi-ASR-and-TTS

EC499: Major Project

Language: Shell - Size: 68.4 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 2

akash13s/audio-to-image Fork of rishavroy97/audio-to-image

Pipeline for generating images conditioned on input audio

Language: Python - Size: 3.11 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

JingleCate/SpeechEmotionRecog

A simple Speech Emotion Recognition (SER) project based on Wav2Vec2.

Language: Python - Size: 235 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

kamalesh003/NoiseCancellationTranscriptionModel

Noise Cancellation Transcription Model Using Wav2Vec2

Language: Jupyter Notebook - Size: 240 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

AmirAbaskohi/Automatic-Speech-recognition-for-Speech-Assessment-of-Persian-Preschool-Children

Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition (ASR) system would not help since they are pre-trained on voices that differ from children's in terms of frequency and amplitude. Because most of these are pre-trained with data in a specific range of amplitude, their objectives do not make them ready for voices in different amplitudes. To overcome this issue, we added a new objective to the masking objective of the Wav2Vec 2.0 model called Random Frequency Pitch (RFP). In addition, we used our newly introduced dataset to fine-tune our model for Meaningless Words (MW) and Rapid Automatic Naming (RAN) tests. Using masking in concatenation with RFP outperforms the masking objective of Wav2Vec 2.0 by reaching a Word Error Rate (WER) of 1.35. Our new approach reaches a WER of 6.45 on the Persian section of the CommonVoice dataset. Furthermore, our novel methodology produces positive outcomes in zero- and few-shot scenarios.

Language: Jupyter Notebook - Size: 1.23 MB - Last synced at: 7 months ago - Pushed at: about 2 years ago - Stars: 21 - Forks: 1

Sarasadeghii/Sharif-Wav2vec2

This repo shows how to finetune the wav2vec2.0 model along with its prerequisites.

Language: Jupyter Notebook - Size: 297 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

inboxpraveen/LLM-Minutes-of-Meeting

🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where we'll be open for contributions to enable real-time meeting transcription! 🚀

Language: Python - Size: 7.14 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 57 - Forks: 7

daanzu/wav2vec2_stt_python

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Language: Python - Size: 88.9 KB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 24 - Forks: 3

tracyreuter/NLP-speech-to-text

Convert speech to text using HuggingFace, comparing Wav2Vec2 versus OpenAI Whisper

Language: Jupyter Notebook - Size: 2.35 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

sebinbenjamin/wav2vec_demo

A Python tool for transcribing speech from audio files using the Wav2Vec 2.0 model. Supports multilingual transcription, automatic audio chunking, and easy setup

Language: Python - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

wngh1187/IPET

Pytorch implementation of INTEGRATED PARAMETER-EFFICIENT TUNING FOR GENERAL-PURPOSE AUDIO MODELS

Language: Python - Size: 4.28 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 0

moncefbenaicha/SpokenNER

Spoken NER implementation based on Wav2Vec2-XLS-R with experiments on transfer learning

Language: Python - Size: 627 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

pradeepbatchu/speechtotext

Speech to Text with Wav2Vec2 using torchaudio

Language: Python - Size: 533 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 1

somosnlp/wav2vec2-spanish

Pre-train a Spanish Wav2Vec2 model using the Spanish portion of the Common Voice dataset.

Language: Python - Size: 17.6 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 1

seb5433/wav2vec2-speaker-recognition

Speaker recognition task using wav2vec2 model.

Language: Python - Size: 21.5 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ranchlai/wav2vec-2.0

Wav2vec2 English speech recognition in PaddlePaddle

Language: Python - Size: 316 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 1

Sreyan88/Toxicity-Detection-in-Spoken-Utterances

This repository contains the code for the paper: "DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances"

Language: Jupyter Notebook - Size: 976 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 5

PeterGilles/Speech-Recognition-Lecture---Data-Science-in-Humanities

Material for my lecture on Automatic Speech Recognition

Language: Jupyter Notebook - Size: 26.4 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

andrejanesic/Voice-Assistant

🤗 Voice assistant built in Python with NLP & wav2vec2.

Language: Python - Size: 36.7 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ECNU-Cross-Innovation-Lab/ENT

[ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

Language: Python - Size: 638 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0

mikezzb/lyrics-sync

A deep learning lyrics-to-audio alignment system, generating synchronized lyrics from a song and its lyrics

Language: Jupyter Notebook - Size: 18.4 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 8 - Forks: 1

Sreyan88/Indic-ASR

Repository for pre-trained wav2vec 2.0 models on 7 Indian languages

Language: Python - Size: 18.6 KB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 0

zhu00121/Universal-representation-dynamics-of-deepfake-speech

This repo contains code used in the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection"

Language: Python - Size: 339 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

egorsmkv/asr-corpus-creator 📦

This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.

Language: Python - Size: 2.47 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 3

nisheethjaiswal/Speech-to-Text

Speech to text implementation using transformers in PyTorch.

Language: Jupyter Notebook - Size: 333 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 0

RubensZimbres/Repo-2022

Python codes on PyTorch, Tensorflow, Keras, Wav2Vec2 Fine-Tuning and Google Cloud

Language: Jupyter Notebook - Size: 74.7 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 7 - Forks: 4

ShafakatArnob/Automatic-Bengali-Subtitle-Generation-Deep-Learning

Automatic Subtitle Generation for Bengali Multimedia Using Deep Learning.

Language: Jupyter Notebook - Size: 13.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RajGothi/Improving-Automatic-Speech-Recognition-with-Dialect-Specific-Language-Models

This repository contains the implementation of our published paper titled 'Improving Automatic Speech Recognition with Dialect-Specific Language Models,' presented at SPECOM'23.

Language: Jupyter Notebook - Size: 1000 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ECNU-Cross-Innovation-Lab/ShiftSER

[ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations

Language: Python - Size: 53.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 2

Dhruv16S/Transcribing-Video-to-Text

This repository is an implementation of the Wav2Vec2 model for converting speech into text through a series of speech recognition, noise removal and STT to transcribe the text from a video file.

Language: Python - Size: 13.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

agustyawan-arif/wav2vec2-large-xlsr-53-id

Performing audio transcription using the Wav2Vec2 model trained on the Common Voice dataset 13 for Indonesian.

Language: Python - Size: 5.18 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

kingabzpro/hindiSpeechPro-Automatic-Speech-Recognization Fork of SakshiRathi77/hindiSpeechPro-Automatic-Speech-Recognization

The project,being part of Kagglex BIPOC Mentorship Program final project, aims to train two separate Hindi ASR models using the Facebook Wav2Vec2 (300M parameters) and OpenAI Whisper-Small models, respectively. The goal is to compare their performance, with a target WER of less than 13%, across various Hindi accents and dialects.

Language: Jupyter Notebook - Size: 2.56 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

SakshiRathi77/hindiSpeechPro-Automatic-Speech-Recognization

The project,being part of Kagglex BIPOC Mentorship Program final project, aims to train two separate Hindi ASR models using the Facebook Wav2Vec2 (300M parameters) and OpenAI Whisper-Small models, respectively. The goal is to compare their performance, with a target WER of less than 13%, across various Hindi accents and dialects.

Language: Jupyter Notebook - Size: 2.11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1