Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub / Morawetz / Speech-to-text-data_collection
Speech-to-text data collection with Kafka, Airflow, and Spark, building a pipeline that can be deployed to process posting and receiving text and audio files from and into a data lake, apply transformation in a distributed manner, and load it into a warehouse in a suitable format to train a speech-to-text model.
JSON API: https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Morawetz%2FSpeech-to-text-data_collection
Stars: 2
Forks: 8
Open Issues: 8
License: None
Language: Python
Repo Size: 38.5 MB
Dependencies:
15
Created: over 2 years ago
Updated: almost 2 years ago
Last pushed: over 2 years ago
Last synced: about 1 year ago
Files
Dependencies
- airflow *
- boto3 *
- jiwer *
- kafka-python *
- librosa ==0.8.1
- matplotlib ==3.4.2
- mlflow *
- numba ==0.53.1
- numpy ==1.19.5
- pandas ==1.3.1
- python_speech_features *
- scikit_learn ==0.24.2
- scipy ==1.6.2
- streamlit *
- tensorflow *