mfcc | Topic | Ecosyste.ms: Repos

Topic: "mfcc"

ddbourgin/numpy-ml

Machine learning, in numpy

Language: Python - Size: 10 MB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 16,067 - Forks: 3,801

aubio/aubio

a library for audio and music analysis

Language: C - Size: 11.3 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 3,432 - Forks: 391

libAudioFlux/audioFlux

A library for audio and music analysis, feature extraction.

Language: C - Size: 7.11 MB - Last synced at: 6 months ago - Pushed at: 12 months ago - Stars: 2,796 - Forks: 118

x4nth055/emotion-recognition-using-speech

Building and training Speech Emotion Recognizer that predicts human emotions using Python, Sci-kit learn and Keras

Language: Python - Size: 944 MB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 621 - Forks: 242

ar1st0crat/NWaves

.NET DSP library with a lot of audio processing functions

Language: C# - Size: 7.28 MB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 487 - Forks: 77

SuperKogito/spafe

:sound: spafe: Simplified Python Audio Features Extraction

Language: Python - Size: 20.7 MB - Last synced at: 8 days ago - Pushed at: about 2 months ago - Stars: 471 - Forks: 79

adamstark/Gist

A C++ Library for Audio Analysis

Language: C++ - Size: 938 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 378 - Forks: 76

sp-nitech/SPTK

A suite of speech signal processing tools

Language: C++ - Size: 5.57 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 232 - Forks: 27

jsingh811/pyAudioProcessing

Audio feature extraction and classification

Language: Python - Size: 22.9 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 225 - Forks: 39

gionanide/Speech_Signal_Processing_and_Classification

Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].

Language: Python - Size: 827 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 220 - Forks: 62

SuperKogito/Voice-based-gender-recognition

:sound: :boy: :girl:Voice based gender recognition using Mel-frequency cepstrum coefficients (MFCC) and Gaussian mixture models (GMM)

Language: Python - Size: 8.96 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 213 - Forks: 68

csukuangfj/kaldifeat

Kaldi-compatible online & offline feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd - Provide C++ & Python API

Language: C++ - Size: 10.3 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 197 - Forks: 38

sp-nitech/diffsptk

A differentiable version of SPTK

Language: Python - Size: 1.65 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 182 - Forks: 16

SuyashMore/MevonAI-Speech-Emotion-Recognition

Identify the emotion of multiple speakers in an Audio Segment

Language: C - Size: 63.6 MB - Last synced at: 12 days ago - Pushed at: about 2 years ago - Stars: 169 - Forks: 48

tympanix/subsync

Synchronize your subtitles using machine learning

Language: Python - Size: 468 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 153 - Forks: 16

ewan-xu/LibrosaCpp

LibrosaCpp is a c++ implemention of librosa to compute short-time fourier transform coefficients,mel spectrogram or mfcc

Language: C++ - Size: 2.55 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 139 - Forks: 39

amanbasu/speech-emotion-recognition

Detecting emotions using MFCC features of human speech using Deep Learning

Language: Jupyter Notebook - Size: 2.53 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 118 - Forks: 40

ZhuoZhuoCrayon/AcousticKeyBoard-Web

声学键盘｜❓脑洞大开：做一个能听懂键盘敲击键位的「玩具」，学习信号处理 / 深度学习 / 安卓 / Django。

Language: Python - Size: 68.3 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 86 - Forks: 5

GauravWaghmare/Speaker-Identification

A program for automatic speaker identification using deep learning techniques.

Language: Python - Size: 436 KB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 83 - Forks: 29

MycroftAI/sonopy

A simple audio feature extraction library

Language: Python - Size: 8.79 KB - Last synced at: 21 days ago - Pushed at: almost 6 years ago - Stars: 79 - Forks: 21

mathquis/node-personal-wakeword

Personal wake word detector

Language: JavaScript - Size: 103 KB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 63 - Forks: 8

ZitengWang/python_kaldi_features

python codes to extract MFCC and FBANK speech features for Kaldi

Language: Python - Size: 79.1 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 60 - Forks: 16

georgid/AlignmentDuration

Lyrics-to-audio-alignement system. Based on Machine Learning Algorithms: Hidden Markov Models with Viterbi forced alignment. The alignment is explicitly aware of durations of musical notes. The phonetic model are classified with MLP Deep Neural Network.

Language: Python - Size: 342 MB - Last synced at: 12 months ago - Pushed at: about 5 years ago - Stars: 55 - Forks: 6

SuperKogito/Voice-based-speaker-identification

:sound: :boy: :girl: :woman: :man: Speaker identification using voice MFCCs and GMM

Language: Python - Size: 105 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 54 - Forks: 15

stefantaubert/mel-cepstral-distance

A Python library for computing the Mel-Cepstral Distance (Mel-Cepstral Distortion, MCD) between two inputs. This implementation is based on the method proposed by Robert F. Kubichek in "Mel-Cepstral Distance Measure for Objective Speech Quality Assessment".

Language: Python - Size: 59.8 MB - Last synced at: 3 days ago - Pushed at: 10 days ago - Stars: 53 - Forks: 10

aubio/vamp-aubio-plugins

aubio plugins for Vamp

Language: C++ - Size: 440 KB - Last synced at: 27 days ago - Pushed at: over 7 years ago - Stars: 48 - Forks: 12

zafarrafii/Zaf-Python

Zafar's Audio Functions in Python for audio signal analysis: STFT, inverse STFT, mel filterbank, mel spectrogram, MFCC, CQT kernel, CQT spectrogram, CQT chromagram, DCT, DST, MDCT, inverse MDCT.

Language: Jupyter Notebook - Size: 116 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 47 - Forks: 11

supikiti/PNCC

A implementation of Power Normalized Cepstral Coefficients: PNCC

Language: Python - Size: 25.4 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 47 - Forks: 10

zafarrafii/Zaf-Matlab

Zafar's Audio Functions in Matlab for audio signal analysis: STFT, inverse STFT, mel filterbank, mel spectrogram, MFCC, CQT kernel, CQT spectrogram, CQT chromagram, DCT, DST, MDCT, inverse MDCT.

Language: Jupyter Notebook - Size: 86 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 43 - Forks: 14

mechanicalsea/spectra

Spectra extraction tutorials based on torch and torchaudio.

Language: Jupyter Notebook - Size: 3.31 MB - Last synced at: 6 months ago - Pushed at: almost 2 years ago - Stars: 40 - Forks: 4

sheelabhadra/Emergency-Vehicle-Detection

Python implementation of papers on emergency vehicle detection using audio signals

Language: Jupyter Notebook - Size: 7.78 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 39 - Forks: 13

pulakk/Live-Audio-MFCC

Live Audio MFCC Visualization in the browser using Web Audio API - https://pulakk.github.io/Live-Audio-MFCC/tutorial

Language: JavaScript - Size: 928 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 34 - Forks: 6

FragJage/SpeakerVoiceIdentifier

SpeakerVoiceIdentifier can recognize the voice of a speaker by learning.

Language: C++ - Size: 20.3 MB - Last synced at: about 2 months ago - Pushed at: about 8 years ago - Stars: 33 - Forks: 14

skaws2003/pytorch-mfcc

A pytorch implementation of MFCC.

Language: Python - Size: 65.4 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 31 - Forks: 1

k-farruh/speech-accent-detection

The human speaks a language with an accent. A particular accent necessarily reflects a person's linguistic background. The model defines accent based audio record. The result of the model could be used to determine accents and help decrease accents to English learning students and improve accents by training.

Language: Python - Size: 21.7 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 31 - Forks: 8

alicex2020/Mandarin-Tone-Classification

Deep learning using CNN for Mandarin Chinese tone classification

Language: Jupyter Notebook - Size: 489 KB - Last synced at: 9 months ago - Pushed at: about 6 years ago - Stars: 31 - Forks: 7

alicex2020/Deep-Learning-Lie-Detection

Use machine learning models to detect lies based solely on acoustic speech information

Language: Jupyter Notebook - Size: 837 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 30 - Forks: 10

dydtjr1128/Speaker-Recognition-using-NN

Speaker Recognition using Neural Network & Linear Regression

Language: Jupyter Notebook - Size: 46.8 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 29 - Forks: 7

zhengyima/DTW_Digital_Voice_Recognition

基于DTW与MFCC特征进行数字0-9的语音识别，DTW，MFCC，语音识别，中英数据，端点检测，Digital Voice Recognition。

Language: Python - Size: 6.26 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 28 - Forks: 4

GuitarsAI/BasicsMusicalInstrumClassifi

Basics of Musical Instruments Classification using Machine Learning

Language: Jupyter Notebook - Size: 13.8 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 28 - Forks: 12

linksense/ConvolutionaNeuralNetworksToEnhanceCodedSpeech

In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral domain features. The proposed postprocessors in both domains are evaluated for various narrowband and wideband speech codecs in a wide range of conditions. The proposed postprocessor improves speech quality (PESQ) by up to 0.25 MOS-LQO points for G.711, 0.30 points for G.726, 0.82 points for G.722, and 0.26 points for adaptive multirate wideband codec (AMR-WB). In a subjective CCR listening test, the proposed postprocessor on G.711-coded speech exceeds the speech quality of an ITU-T-standardized postfilter by 0.36 CMOS points, and obtains a clear preference of 1.77 CMOS points compared to G.711, even en par with uncoded speech.

Language: Python - Size: 597 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 27 - Forks: 11

FragIt/fragit-main

FragIt main repository

Language: Python - Size: 529 KB - Last synced at: 27 days ago - Pushed at: 29 days ago - Stars: 26 - Forks: 12

nipunmanral/Spoken-Language-Identification

Implement a GRU/LSTM model using Keras, and train it to classify the languages using MFCC features

Language: Python - Size: 6.84 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 25 - Forks: 16

DataXujing/ASR-paper

:fire: ASR教程: https://dataxujing.github.io/ASR-paper/

Size: 1.07 GB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 24 - Forks: 6

IhabBendidi/Voice-authentification-API

A RESTFUL API implementation of an authentification system using voice fingerprint

Language: Python - Size: 5.97 MB - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 24 - Forks: 2

Abhay0899193/Speaker-Recognition

Speaker Recognition System using MFCC and GMM.

Language: Python - Size: 7.98 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 24 - Forks: 12

zafarrafii/CQHC-Python

Constant-Q harmonic coefficients (CQHCs), a timbre feature designed for music signals.

Language: Jupyter Notebook - Size: 84.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 23 - Forks: 1

ringabout/scim

[wip]Speech recognition tool-box written by Nim. Based on Arraymancer.

Language: Nim - Size: 354 KB - Last synced at: 7 months ago - Pushed at: over 5 years ago - Stars: 23 - Forks: 0

geekysethi/audio_classification

Language: Python - Size: 1010 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 21 - Forks: 4

JavierAntoran/tiger-costume-voice-conversion

Voice Alignment and Conversion with Neural Networks and the WORLD codec.

Language: Jupyter Notebook - Size: 63.5 MB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 20 - Forks: 1

dhruvesh13/Audio-Genre-Classification

Automatic music genre classification using Machine Learning algorithms like- Logistic Regression and K-Nearest Neighbours

Language: Python - Size: 11.7 KB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 19 - Forks: 11

zhengyima/GMM_Digital_Voice_Recognition

基于GMM与MFCC特征进行数字0-9的语音识别，GMM，MFCC，语音识别，中文数据，sklearn，Digital Voice Recognition。

Language: Python - Size: 532 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 16 - Forks: 2

orbxball/timit-preprocessor

Extract mfcc vectors and phones from TIMIT dataset

Language: Shell - Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 0

LexicalStressDetection/lexical-stress-detection

Deep Learning model for lexical stress detection in spoken English

Language: Python - Size: 2.43 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 15 - Forks: 2

ShoYamanishi/AndroidMFCC

26-Point MFCC & 512-Point FFT Generator & Visualizer in Java, C++, and NEON intrinsics

Language: C++ - Size: 6.02 MB - Last synced at: 26 days ago - Pushed at: over 5 years ago - Stars: 15 - Forks: 2

kleinzcy/speech_signal_processing

Language: Python - Size: 15.5 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 15 - Forks: 2

HassanHayat08/Interpretable-CNN-for-Big-Five-Personality-Traits-using-Audio-Data

We developed an interpretable CNN for big five personality traits using human speech data. This project discovers the different frequency patterns of a human voice with respect to each five personality traits. This project will help us to understand the apparent personality of a human using his/her voice.

Language: Python - Size: 16.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 14 - Forks: 3

zhengyima/HMM_Digital_Voice_Recognition

基于HMM与MFCC特征进行数字0-9的语音识别，HMM，GMMHMM，MFCC，语音识别，sklearn，Digital Voice Recognition。

Language: Python - Size: 764 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 13 - Forks: 4

amitchone/ASR

A Python 2.7 implementation of Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) algorithms for Automated Speech Recognition (ASR).

Language: Python - Size: 13.6 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 13 - Forks: 4

baggepinnen/LPVSpectral.jl

Least-squares (sparse) spectral estimation and (sparse) LPV spectral decomposition.

Language: Julia - Size: 424 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 12 - Forks: 6

anicolson/matlab_feat

Functions for creating speech features in MATLAB.

Language: MATLAB - Size: 39.1 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 12 - Forks: 5

yashbhalgat/Emotion-from-speech-MFCC

Maltab code for extraction of Mel Frequency Cepstral Coefficients

Language: Matlab - Size: 279 KB - Last synced at: 5 months ago - Pushed at: about 9 years ago - Stars: 12 - Forks: 8

lucko515/Speech-commands-recognition

Recognizing common speech commands using Keras and Tensorflow.

Language: Python - Size: 11 MB - Last synced at: 26 days ago - Pushed at: over 6 years ago - Stars: 11 - Forks: 3

NeuroByte-Consulting/Speech-Emotion-Recognition-in-Tensorflow-Using-CNNs

Speech Emotion Recognition (SER) in Tensorflow using CNNs and CRNNs Based on Mel Spectrograms and Mel Frequency Cepstral Coefficients (MFCCs)

Language: Jupyter Notebook - Size: 15.6 MB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 9 - Forks: 0

pdadial/Speech_Emotion_Recognition_CNN-LSTM

CNN-LSTM based SER model using RAVDESS database

Language: Jupyter Notebook - Size: 202 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 2

sarthak268/Audio-Classification-using-MFCC-and-Spectrogram

Audio classification using a simple SVM classifier making use of MFCC and Spectrogram features coded from scratch

Language: Python - Size: 240 KB - Last synced at: about 1 year ago - Pushed at: about 5 years ago - Stars: 9 - Forks: 1

RBGTOP/Music-Genre-Recognition

Music genre classification using deep learning

Size: 1.95 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 8 - Forks: 0

acen20/cnn-tf-keras-audio-classification

Feature extraction from sound signals along with complete CNN model and evaluations using tensorflow, keras and, librosa for MFCC generation

Language: Jupyter Notebook - Size: 5.34 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 8 - Forks: 4

grimmdaniel/personality-trait-prediction

Big Five personality trait prediction on First Impressions V2 dataset

Language: Jupyter Notebook - Size: 729 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 8 - Forks: 7

sahilsharma884/Music-Genre-Classification

Perform three types of feature extraction: STFT, MFCC and MelSpectrogram. Apply CNN/VGG with or without RNN architecture. Able to achieve 95% accuracy.

Language: Python - Size: 4.53 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 8 - Forks: 3

Ralireza/spoken-digit-recognition

Classifying English spoken digit by Hidden Markov Model

Language: Python - Size: 6.92 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 8 - Forks: 3

miselaytes-anton/whospeaks

Speaker recognition using Mel Frequency Cepstral Coefficients (MFCC) and Linde-Buzo-Gray (LBG) clustering algorithm

Language: JavaScript - Size: 28.1 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 8 - Forks: 3

lincolnhard/sound-feature-extraction-C

Useful feature extraction for next step classification

Language: C - Size: 9.77 KB - Last synced at: 4 months ago - Pushed at: over 7 years ago - Stars: 8 - Forks: 7

msaintfelix/TensorFlow_MusicGenre_Classifier

Classifying wav files with their MFCC

Language: Jupyter Notebook - Size: 608 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 3

FedericaPaoli1/stm32-speech-recognition-and-traduction

stm32-speech-recognition-and-traduction is a project developed for the Advances in Operating Systems exam at the University of Milan (academic year 2020-2021). It implements a speech recognition and speech-to-text translation system using a pre-trained machine learning model running on the stm32f407vg microcontroller.

Language: C - Size: 41.7 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 1

DenaJGibbon/MFCC-Vocal-Fingerprinting

Code to do MFCC feature extraction on gibbon calls and use LDA/SVM for classification

Language: R - Size: 9.28 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 7 - Forks: 0

harshitkgupta/Classification-of-Autism-Spectrum-Disorder

Machine Leaning Approaches for Classification of Children with Autism Spectrum Disorder

Language: Jupyter Notebook - Size: 283 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 7 - Forks: 2

ragibson/MFCC-speech-recognition

Real-time speech recognition via "Mel-Frequency Cepstral Coefficients" neural networks.

Language: Jupyter Notebook - Size: 1.05 MB - Last synced at: 22 days ago - Pushed at: almost 6 years ago - Stars: 7 - Forks: 0

PranavPutsa1006/Speaker-Diarization

Identifying individual speakers in an audio stream based on the unique characteristics found in individual voices using Python

Language: Jupyter Notebook - Size: 20.2 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 1

Sangramsingkayte/Audio-Feature-Extraction

In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC.

Language: Jupyter Notebook - Size: 1.24 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 6