Topic: "wav2vec2"
PaddlePaddle/PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Language: Python - Size: 69.4 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 11,799 - Forks: 1,903

s3prl/s3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Language: Python - Size: 135 MB - Last synced at: about 5 hours ago - Pushed at: about 1 month ago - Stars: 2,377 - Forks: 499

audeering/w2v2-how-to
How to use our public wav2vec2 dimensional emotion model
Language: Jupyter Notebook - Size: 98.6 KB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 398 - Forks: 47

oliverguhr/wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Language: Python - Size: 2.84 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 348 - Forks: 56

pszemraj/vid2cleantxt
Python API & command-line tool to easily transcribe speech-based video files into clean text
Language: Jupyter Notebook - Size: 723 MB - Last synced at: 15 days ago - Pushed at: 6 months ago - Stars: 209 - Forks: 29

habla-liaa/ser-with-w2v2
Official implementation of INTERSPEECH 2021 paper 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'
Language: Jupyter Notebook - Size: 32.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 128 - Forks: 23

khanld/ASR-Wav2vec-Finetune
:zap: Finetune Wa2vec 2.0 For Speech Recognition
Language: Python - Size: 5.1 MB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 127 - Forks: 28

vietai/ASR
End-to-End Vietnamese Speech Recognition using wav2vec 2.0
Size: 10.7 KB - Last synced at: 20 days ago - Pushed at: over 3 years ago - Stars: 98 - Forks: 9

thevasudevgupta/gsoc-wav2vec2
GSoC'2021 | TensorFlow implementation of Wav2Vec2
Language: Jupyter Notebook - Size: 6.67 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 78 - Forks: 29

Telegram-Zalo/zac2022-lyric-alignment
Solution for Zalo AI Challenge 2022 - Lyrics Alignment
Language: Python - Size: 949 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 61 - Forks: 18

inboxpraveen/LLM-Minutes-of-Meeting
🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where we'll be open for contributions to enable real-time meeting transcription! 🚀
Language: Python - Size: 7.14 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 57 - Forks: 7

vectominist/MiniASR
A mini, simple, and fast end-to-end automatic speech recognition toolkit.
Language: Jupyter Notebook - Size: 342 KB - Last synced at: 10 days ago - Pushed at: over 2 years ago - Stars: 50 - Forks: 6

tuanio/noisy-student-training-asr
Pytorch implementation of Noisy Student Training for Automatic Speech Recognition and Automatic Pronunciation Error Detection problem
Language: Python - Size: 3.07 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 44 - Forks: 7

pooya-mohammadi/audio-classification-pytorch
In this project, several approaches for training/finetuning an audio gender recognition is provided. The code can simply be used for any other audio classification task by simply changing the number of classes and the input dataset.
Language: Jupyter Notebook - Size: 871 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 41 - Forks: 4

khanld/Wav2vec2-Pretraining
Wav2vec 2.0 Self-Supervised Pretraining
Language: Python - Size: 303 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 4

ttop32/wav2vec2-live-japanese-translator
real time japanese speech recognition translator using wav2vec2
Language: Jupyter Notebook - Size: 926 KB - Last synced at: 18 days ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 3

lstrgar/self-supervised-phone-segmentation
Phoneme segmentation using pre-trained speech models
Language: Python - Size: 106 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 4

lucasgris/wav2vec4bp
Wav2vec resources and models for Brazilian Portuguese
Language: Jupyter Notebook - Size: 1.65 MB - Last synced at: 23 days ago - Pushed at: almost 3 years ago - Stars: 33 - Forks: 2

Hamtech-ai/wav2vec2-fa
fine-tune Wav2vec2. an ASR model released by Facebook
Language: Jupyter Notebook - Size: 549 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 32 - Forks: 3

egorsmkv/asr-corpus-creator 📦
This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.
Language: Python - Size: 2.47 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 3

mmakiuchi/multimodal_emotion_recognition
Scripts used in the research described in the paper "Multimodal Emotion Recognition with High-level Speech and Text Features" accepted in the ASRU 2021 conference.
Language: Python - Size: 77.1 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 25 - Forks: 6

mt-upc/SHAS
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
Language: Python - Size: 368 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 24 - Forks: 2

daanzu/wav2vec2_stt_python
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition
Language: Python - Size: 88.9 KB - Last synced at: 4 days ago - Pushed at: over 3 years ago - Stars: 24 - Forks: 3

AmirAbaskohi/Automatic-Speech-recognition-for-Speech-Assessment-of-Persian-Preschool-Children
Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition (ASR) system would not help since they are pre-trained on voices that differ from children's in terms of frequency and amplitude. Because most of these are pre-trained with data in a specific range of amplitude, their objectives do not make them ready for voices in different amplitudes. To overcome this issue, we added a new objective to the masking objective of the Wav2Vec 2.0 model called Random Frequency Pitch (RFP). In addition, we used our newly introduced dataset to fine-tune our model for Meaningless Words (MW) and Rapid Automatic Naming (RAN) tests. Using masking in concatenation with RFP outperforms the masking objective of Wav2Vec 2.0 by reaching a Word Error Rate (WER) of 1.35. Our new approach reaches a WER of 6.45 on the Persian section of the CommonVoice dataset. Furthermore, our novel methodology produces positive outcomes in zero- and few-shot scenarios.
Language: Jupyter Notebook - Size: 1.23 MB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 1

ECNU-Cross-Innovation-Lab/ShiftSER
[ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations
Language: Python - Size: 53.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 2

kingabzpro/WOLOF-ASR-Wav2Vec2
Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.
Language: Jupyter Notebook - Size: 3.34 MB - Last synced at: about 11 hours ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 8

skit-ai/Map-Mix
The official implementation of the method discussed in the paper Improving Spoken Language Identification with Map-Mix(work accepted at ICASSP-2023)
Size: 18.1 MB - Last synced at: 30 days ago - Pushed at: about 2 years ago - Stars: 16 - Forks: 1

FernandoLpz/SpeechRecognition
This repository contains the implementation of an Automatic Speech Recognition system in python, using a client-server architecture with Web Sockets.
Language: Python - Size: 118 KB - Last synced at: 21 days ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 2

techiaith/docker-huggingface-stt-cy
Adnabod lleferydd Cymraeg i'r Gymraeg gyda HuggingFace // Speech Recognition for Welsh with HuggingFace
Language: Python - Size: 321 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 4

notAI-tech/IndicASR
Speeech Recognition for Indic languages.
Language: Python - Size: 623 KB - Last synced at: 19 days ago - Pushed at: about 4 years ago - Stars: 13 - Forks: 3

yamahigashi/Wav2Vec2FBX
Recognize speech from an audio file and convert it into animation FBX
Language: Python - Size: 199 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 12 - Forks: 3

aryanxxvii/lark
Speech Assessment API in FastAPI with HuggingFace 🤗
Language: JavaScript - Size: 183 KB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 11 - Forks: 0

jmaczan/asr-dysarthria
Research on Automatic Speech Recognition for dysarthric speech
Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 11 - Forks: 2

wngh1187/IPET
Pytorch implementation of INTEGRATED PARAMETER-EFFICIENT TUNING FOR GENERAL-PURPOSE AUDIO MODELS
Language: Python - Size: 4.28 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 0

parvatijay2901/Hindi-ASR-and-TTS
EC499: Major Project
Language: Shell - Size: 68.4 KB - Last synced at: 20 days ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 2

Sreyan88/Toxicity-Detection-in-Spoken-Utterances
This repository contains the code for the paper: "DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances"
Language: Jupyter Notebook - Size: 976 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 5

mikezzb/lyrics-sync
A deep learning lyrics-to-audio alignment system, generating synchronized lyrics from a song and its lyrics
Language: Jupyter Notebook - Size: 18.4 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 1

kardSIM/audio2img
Extend the Conditioning of Stable Diffusion to take Audio Embeddings Instead of Text Embeddings using Wav2Vec2-BERT model
Language: Jupyter Notebook - Size: 29.8 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 7 - Forks: 1

ECNU-Cross-Innovation-Lab/ENT
[ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
Language: Python - Size: 638 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0

louisbrulenaudet/balena
BALanced Execution through Natural Activation : a human-computer interaction methodology for code running.
Language: Python - Size: 229 KB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 1

Msparihar/Transcriber
Developed an AI tool to automatically generate captions and transcripts for YouTube videos in 67 languages and can generate summarized texts in 133 languages.
Language: Python - Size: 14.6 KB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

RubensZimbres/Repo-2022
Python codes on PyTorch, Tensorflow, Keras, Wav2Vec2 Fine-Tuning and Google Cloud
Language: Jupyter Notebook - Size: 74.7 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 4

scottykwok/cantonese-selfish-project
Cantonese Selfish Project 廣東話自肥企劃 at PYCON HK 2021
Language: Jupyter Notebook - Size: 8.44 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 1

hammaad2002/ASRAdversarialAttacks
An ASR (Automatic Speech Recognition) adversarial attack repository.
Language: Jupyter Notebook - Size: 10 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 1

JuJu2181/Automatic-Nepali-Speech-Recognition-and-Summarizer
A system capable of converting Nepali speech to text and generate summary of text
Language: Jupyter Notebook - Size: 288 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 2

pradeepbatchu/speechtotext
Speech to Text with Wav2Vec2 using torchaudio
Language: Python - Size: 533 KB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 1

seanghay/kfa
A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus
Language: Python - Size: 10.1 MB - Last synced at: 17 days ago - Pushed at: 12 months ago - Stars: 5 - Forks: 0

jvel07/wav2vec2_patho
Fine-tuning wav2vec2 to for Pathological Speech Processing
Language: Jupyter Notebook - Size: 4.05 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 1

navalnica/wav2vec2-belarusian
Speech to Text model for Belarusian language
Language: Jupyter Notebook - Size: 1.37 MB - Last synced at: 18 days ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 0

somosnlp/wav2vec2-spanish
Pre-train a Spanish Wav2Vec2 model using the Spanish portion of the Common Voice dataset.
Language: Python - Size: 17.6 KB - Last synced at: 12 months ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 1

nhut-ngnn/Multimodal-Speech-Emotion-Recognition
A multimodal SER project combining BERT and ECAPA-TDNN with cross-attention-based fusion on the IEMOCAP dataset.
Language: Python - Size: 7.07 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

Sreyan88/Indic-ASR
Repository for pre-trained wav2vec 2.0 models on 7 Indian languages
Language: Python - Size: 18.6 KB - Last synced at: 11 months ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 0

slinusc/speaker_identification_evaluation
Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
Language: Jupyter Notebook - Size: 8.56 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

gulabpatel/Speech-to-Text
Language: Jupyter Notebook - Size: 1.64 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 3 - Forks: 0

Sarasadeghii/Sharif-Wav2vec2
This repo shows how to finetune the wav2vec2.0 model along with its prerequisites.
Language: Jupyter Notebook - Size: 297 KB - Last synced at: 11 months ago - Pushed at: 12 months ago - Stars: 3 - Forks: 0

dsalnikov/wav2vec
pure numpy implementation of wav2vec 2.0
Language: Python - Size: 4.88 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

TerboucheHacene/speech-keyword-spotting
Speech Keyword detection using Wav2Vec Model
Language: Python - Size: 314 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

ranchlai/wav2vec-2.0
Wav2vec2 English speech recognition in PaddlePaddle
Language: Python - Size: 316 KB - Last synced at: 12 months ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 1

zlab-foss/Wav2Vec2-XLSR-Finetune
ready to use notebook to finetune wav2vec2 on persian
Language: Jupyter Notebook - Size: 9.27 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 2

nisheethjaiswal/Speech-to-Text
Speech to text implementation using transformers in PyTorch.
Language: Jupyter Notebook - Size: 333 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

baocin/hugging_face_example_STT_api
Demonstration of Hugging Face's (https://huggingface.co/) newly released Wav2Vec2 model for easy, reasonably coherent, Speech to Text!
Language: Python - Size: 41.8 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 3 - Forks: 1

dangrebenkin/wav2vec2_speech_markuper
Automatic generation of speech dataset markup using Wav2Vec2 ASR models
Language: Python - Size: 396 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

trinhtuanvubk/finetune-wav2vec2
Language: Python - Size: 5.15 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

SakshiRathi77/hindiSpeechPro-Automatic-Speech-Recognization
The project,being part of Kagglex BIPOC Mentorship Program final project, aims to train two separate Hindi ASR models using the Facebook Wav2Vec2 (300M parameters) and OpenAI Whisper-Small models, respectively. The goal is to compare their performance, with a target WER of less than 13%, across various Hindi accents and dialects.
Language: Jupyter Notebook - Size: 2.11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

oswaldoludwig/Pruning-pre-trained-models-using-evolutionary-computation
This repository contains scripts to prune Wav2vec2 using a neuroevolution-based method. More details about this method can be found in the paper Compressing Wav2vec2 for Embedded Applications.
Language: Shell - Size: 4.53 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

imsanjoykb/Speech-NLP-Bootcamp
Speech NLP Bootcamp
Language: Jupyter Notebook - Size: 3.1 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

manthanthakker/AudioClassification
This repository contains code/papers/research on Speech or Audio Classification
Language: Jupyter Notebook - Size: 143 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

ZahraRahimii/AirTrafficControl-AutomaticSpeechRecognition-Project
ATC ASR; internship at Asr Gooyesh Company
Language: Jupyter Notebook - Size: 4.2 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

lectly/wav2vec2-large-xlsr-53-egyptian-arabic
Fine-tuning XLS-R for Multi-Lingual ASR with 🤗 Transformers
Language: Jupyter Notebook - Size: 1.89 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

ahammedrohit/Speech-Recognition-using-wav2vec2-with-minimum-GPU
Python Colab for speech recognition with wav2vec2. Since wav2vec2 requires heavy GPU I've come up with a way to run this on Google Colab as well as local machines with minimum GPU.
Language: Jupyter Notebook - Size: 22.5 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

phanxuanphucnd/wav2asr
A library version of wav2vec 2.0 framework for Automatic Speech Recognition task.
Language: Python - Size: 8.5 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 4

kipmccharen/sys6016_DL_project
pretrained SpeechBrain wav2vec seq2seq+CTC model trained on TIMIT dataset. Created by Kip McCharen, Siddharth Surapaneni, and Pavan Bondalapati
Language: Python - Size: 1.21 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 1

egorsmkv/speech-to-text-using-php
Use PHP for Speech-to-Text task. Just a research.
Language: PHP - Size: 289 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

thiagogre/mimicking
English Pronunciation Improvement App
Language: Python - Size: 852 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

aitor-alvarez/acoustic-transformer-models
Acoustic Transformer Models for Audio Classification
Language: Python - Size: 51.8 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

aitor-alvarez/large-speech-models
Fine-tuning Multilingual Large Speech Recognition Models: Wav2vec and Whisper
Language: Python - Size: 84 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

piedeboer96/Digital-Assistant-Audio-Processing
Project 2.2 - Speech Recognition and Speaker Identification
Language: Java - Size: 13 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

JingleCate/SpeechEmotionRecog
A simple Speech Emotion Recognition (SER) project based on Wav2Vec2.
Language: Python - Size: 235 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

zhu00121/Universal-representation-dynamics-of-deepfake-speech
This repo contains code used in the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection"
Language: Python - Size: 339 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

SanchezCris/SDR-Automatic-Speech-Recognition
FM signal capturing system and voice recognition for the assistance of individuals with hearing impairments.
Language: Python - Size: 48 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

sotiriskar/audio-note
Python application for taking audio notes and create summary of meetings.
Language: Python - Size: 887 KB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

kalindasiaminwe/ChitongaASR
A natural language processing and machine learning project for a low resource langauge in Zambia.
Language: Jupyter Notebook - Size: 548 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

thisisHJLee/Fine-Tuning-of-XLSR-Wav2Vec2-on-Korean
Language: Jupyter Notebook - Size: 1.35 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

keshavbhandari/Audioneme
AI model for speech disorder detection
Language: Python - Size: 56.1 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

mead-ml/audio8
Deep audio modeling
Language: Python - Size: 307 KB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

lakshiitakalyanasundaram/DeepSonic
DeepFake Audio detection project using Wav2Vec2 for MOMENTA (Task for internship )
Language: HTML - Size: 64.3 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

mahshid1378/ASR-Wav2vec-Finetune
⚡ Finetune Wa2vec 2.0 For Speech Recognition
Language: Python - Size: 5.01 MB - Last synced at: 15 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

jp1924/ASR
🤗ASR 학습시키기 위한 코드
Language: Python - Size: 413 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

moxeeem/ASR-pronunciation-correction
Этот проект представляет систему автоматической коррекции произношения на английском языке с использованием нейронной сети wav2vec2.
Language: Jupyter Notebook - Size: 9.17 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 1

moncefbenaicha/spoken-ner
Spoken NER implementation based on Wav2Vec2-XLS-R with experiments on transfer learning
Language: Python - Size: 113 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

egorsmkv/w2v2-bert-aligner
Aligner for wav2vec2-bert models
Language: Python - Size: 1.95 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Nightey3s/Speech-Emotion-Recognition-using-Wav2Vec2
A Speech Emotion Recognition (SER) system using Facebook's Wav2Vec2 model that classifies speech into four emotions (Neutral, Happy, Sad, Angry). Achieves 69.02% accuracy on IEMOCAP dataset using modern transformer architecture and comprehensive data augmentation techniques.
Language: Jupyter Notebook - Size: 1.06 MB - Last synced at: 16 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Not-ML/audio-ml
Standalone Audio ML Application: An innovative Python-based tool integrating Speech Recognition (ASR), Sentiment Analysis (NLP), and Text-to-Speech (TTS) to process audio, analyze sentiment, and generate spoken responses. Features both command-line and GUI interfaces for seamless interaction.
Language: Python - Size: 20.5 KB - Last synced at: 28 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

sugarcane-mk/finetuning_wav2vec2
This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers
Language: Jupyter Notebook - Size: 42 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

egorsmkv/wav2vec2-hidet
A test to run w2v2 with hidet optimizer
Language: Python - Size: 402 KB - Last synced at: 27 days ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

akash13s/audio-to-image Fork of rishavroy97/audio-to-image
Pipeline for generating images conditioned on input audio
Language: Python - Size: 3.11 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

kamalesh003/NoiseCancellationTranscriptionModel
Noise Cancellation Transcription Model Using Wav2Vec2
Language: Jupyter Notebook - Size: 240 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

tracyreuter/NLP-speech-to-text
Convert speech to text using HuggingFace, comparing Wav2Vec2 versus OpenAI Whisper
Language: Jupyter Notebook - Size: 2.35 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

sebinbenjamin/wav2vec_demo
A Python tool for transcribing speech from audio files using the Wav2Vec 2.0 model. Supports multilingual transcription, automatic audio chunking, and easy setup
Language: Python - Size: 4.88 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

moncefbenaicha/SpokenNER
Spoken NER implementation based on Wav2Vec2-XLS-R with experiments on transfer learning
Language: Python - Size: 627 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0
