audio-generation | Topic | Ecosyste.ms: Repos

mudler/LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference

Language: Go - Size: 18.2 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 32,451 - Forks: 2,470

FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language: Python - Size: 1.5 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 13,629 - Forks: 1,381

open-mmlab/Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language: Python - Size: 126 MB - Last synced at: 18 days ago - Pushed at: 29 days ago - Stars: 8,975 - Forks: 702

multimodal-art-projection/YuE

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Language: Python - Size: 32.8 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 4,945 - Forks: 541

haoheliu/AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Language: Python - Size: 2.34 MB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 2,627 - Forks: 234

haoheliu/AudioLDM2

Text-to-Audio/Music Generation

Language: Python - Size: 3.73 MB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 2,419 - Forks: 190

rsxdalv/tts-generation-webui

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)

Language: TypeScript - Size: 32.6 MB - Last synced at: 2 days ago - Pushed at: 4 days ago - Stars: 2,153 - Forks: 229

archinetai/audio-diffusion-pytorch

Audio generation using diffusion models, in PyTorch.

Language: Python - Size: 245 KB - Last synced at: 28 days ago - Pushed at: almost 2 years ago - Stars: 2,035 - Forks: 173

archinetai/audio-ai-timeline

A timeline of the latest AI models for audio generation, starting in 2023!

Size: 69.3 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 1,897 - Forks: 71

lucidrains/soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

Language: Python - Size: 264 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1,492 - Forks: 92

declare-lab/tango

A family of diffusion models for text-to-audio generation.

Language: Python - Size: 19.5 MB - Last synced at: 6 months ago - Pushed at: 10 months ago - Stars: 1,086 - Forks: 88

FunAudioLLM/InspireMusic

InspireMusic: A Unified Framework for Music, Song, Audio Generation.

Language: Python - Size: 3.64 MB - Last synced at: 3 days ago - Pushed at: 9 days ago - Stars: 1,084 - Forks: 100

NVIDIA/BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Language: Python - Size: 19.9 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 991 - Forks: 127

Yuan-ManX/ai-audio-datasets

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.

Size: 1.28 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 665 - Forks: 54

modelscope/FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Language: Python - Size: 1.46 MB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 396 - Forks: 33

metame-ai/awesome-audio-plaza

Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

Size: 358 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 382 - Forks: 17

v-iashin/SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

Language: Jupyter Notebook - Size: 163 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 360 - Forks: 39

Yuan-ManX/audio-development-tools

This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, music generation, speech recognition, speech synthesis, singing voice synthesis and more.

Size: 2.18 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 346 - Forks: 24

researchmm/MM-Diffusion

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Language: Python - Size: 4.18 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 346 - Forks: 23

cabralpinto/modular-diffusion

Python library for designing and training your own Diffusion Models with PyTorch.

Language: Python - Size: 31.1 MB - Last synced at: 29 days ago - Pushed at: 10 months ago - Stars: 279 - Forks: 15

sony/bigvsan

Pytorch implementation of BigVSAN

Language: Python - Size: 14.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 203 - Forks: 18

happylittlecat2333/Auffusion

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"

Language: Jupyter Notebook - Size: 23.9 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 182 - Forks: 13

soham97/awesome-sound_event_detection

Reading list for research topics in Sound AI

Size: 145 KB - Last synced at: about 17 hours ago - Pushed at: 9 months ago - Stars: 180 - Forks: 8

galgreshler/Catch-A-Waveform

Official pytorch implementation of the paper: "Catch-A-Waveform: Learning to Generate Audio from a Single Short Example" (NeurIPS 2021)

Language: Python - Size: 255 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 174 - Forks: 32

leopiney/neuralnoise

The AI Podcast Studio: generate podcasts scripts and their audio version with a team of AI workers in a Podcast Studio 🎙️📜

Language: Python - Size: 23.7 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 168 - Forks: 19

devnen/Dia-TTS-Server

Self-host the powerful Dia TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), support for SafeTensors/BF16, voice cloning, dialogue generation, and GPU/CPU execution.

Language: Python - Size: 31.2 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 147 - Forks: 27

archinetai/audio-data-pytorch

A collection of useful audio datasets and transforms for PyTorch.

Language: Python - Size: 47.9 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 138 - Forks: 22

archinetai/audio-diffusion-pytorch-trainer

Trainer for audio-diffusion-pytorch

Language: Python - Size: 264 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 129 - Forks: 22

ilaria-manco/word2wave

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Language: Python - Size: 1.15 MB - Last synced at: 6 months ago - Pushed at: over 3 years ago - Stars: 119 - Forks: 15

RoySheffer/im2wav

Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation

Language: Python - Size: 23.7 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 95 - Forks: 9

sony/soundctm

Pytorch implementation of SoundCTM

Language: Python - Size: 3.76 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 86 - Forks: 7

rsxdalv/bark-speaker-directory

Site for sharing Bark voices

Language: TypeScript - Size: 21.1 MB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 51 - Forks: 0

olaviinha/NeuralTextToAudio

Text prompt steered synthetic audio generators

Language: Jupyter Notebook - Size: 337 KB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 46 - Forks: 7

JavisDiT/JavisDiT

Official implementation of "JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization"

Language: Python - Size: 53.4 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 43 - Forks: 2

rsxdalv/musicgen-prompts

Site for sharing MusicGen + AudioGen Prompts and Creations

Language: TypeScript - Size: 24.4 MB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 42 - Forks: 5

PeiwenSun2000/Both-Ears-Wide-Open

The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Language: Python - Size: 6.78 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 37 - Forks: 3

pollinations-ai/pollinations.ai

Work with the best generative AI from Pollinations using this Python SDK. 🐝

Language: Python - Size: 11.2 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 37 - Forks: 5

Yuanshi9815/LiteFocus

[Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.

Language: Python - Size: 805 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 33 - Forks: 0

Bai-YT/ConsistencyTTA

ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

Language: Python - Size: 5.45 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 32 - Forks: 0

soham97/sound_ai_progress

Tracking states of the arts and recent results (bibliography) on sound tasks.

Size: 51.8 KB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 28 - Forks: 1

0417keito/JEN-1-COMPOSER-pytorch

Unofficial implementation JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation(https://arxiv.org/abs/2310.19180)

Language: Python - Size: 105 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 2

Warma10032/easytts

打造最简单的TTS前端集合，最简单的有声小说制作工作流。基于正则规则对小说进行分句，基于RoBERTa对小说中的对话进行说话人识别，从而实现一键式生成多人有声小说。多说话人的语音合成，高质量的有声小说制作。

Language: Python - Size: 25.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 20 - Forks: 5

JosefAlbers/Aggressor

Ultra-minimal autoregressive diffusion model for image generation

Language: Python - Size: 2.36 MB - Last synced at: 22 days ago - Pushed at: 7 months ago - Stars: 18 - Forks: 2

LJungang/Awesome-Omni-Large-Models-and-Datasets

🔥 Omni large models and datasets for understanding and generating multi-modalities.

Size: 53.7 KB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 15 - Forks: 0

chenjianyi/fastsag

FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation

Language: Python - Size: 580 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 13 - Forks: 4

neeleshpandey/AutomatedNewsChannel

This is a Piece of code that fetches news using an API and Converts it into a NEWS video

Language: Python - Size: 96.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 13 - Forks: 4

Ceaglex/LoVA

The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) architecture, LoVA proves to be more effective at generating long-form audio compared to existing autoregressive models and UNet-based diffusion models.

Language: Python - Size: 3.19 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 12 - Forks: 2

JinjieNi/MixEval-X

The official github repo for MixEval-X, the first any-to-any, real-world benchmark.

Language: Python - Size: 1.24 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 12 - Forks: 0

zassou65535/WaveGAN

WaveGANによる音声生成器

Language: Python - Size: 41 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 0

bean980310/stable-diffusion-docker-project

Stable Diffusion WebUI and KohyaSS, ComfyUI, InvokeAI, Fooocus, and more Generative AI on Docker

Language: Jupyter Notebook - Size: 574 KB - Last synced at: 19 days ago - Pushed at: 29 days ago - Stars: 9 - Forks: 2

merekat/children-stories

OhanashiGPT is an application that generates personalized children's stories based on parameters like age and preferences. It narrates these stories using an AI-generated voice that mimics a parent, trained on their audio samples. The app also creates illustrations to accompany each story, providing a unique and engaging experience for children.

Language: Jupyter Notebook - Size: 122 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 9 - Forks: 0

Consistency-TTA/consistency-tta.github.io

Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

Language: HTML - Size: 151 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 7 - Forks: 0

gianpaj/sexyvoice

Voice cloning and Text to Speech platform. Perfect for content creators, developers, and storytellers.

Language: TypeScript - Size: 2.46 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 6 - Forks: 2

shuklabhay/stereo-sample-gan

StereoSampleGAN: A computationally inexpensive approach high fidelity stereo audio sample generation.

Language: Python - Size: 73.1 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 6 - Forks: 0

klae01/ddim-audio

Denoising Diffusion Implicit Models

Language: Python - Size: 143 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 6 - Forks: 0

LumenPallidium/audio_generation

Experiments in neural networks for audio generation.

Language: Python - Size: 682 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 5 - Forks: 0

devaaravmishra/genius-saas

Genius-SaaS: An AI-powered SaaS application built with Next.js and React for personalized recommendations, dynamic content generation, and user behavior prediction. 🚀

Language: TypeScript - Size: 789 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 2

Lomasterrrr/Audio-Generator

Generates a sound given: volume, frequency, duration!

Language: C# - Size: 3 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 1

inferless/bark

Text-to-Speech model that generates realistic, multilingual speech with music, background noise, and sound effects. <metadata> gpu: T4 | collections: ["HF Transformers"] </metadata>

Language: Python - Size: 40 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 4 - Forks: 11

Justmalhar/tts-studio

Text to Speech Studio to convert text into natural-sounding speech using advanced AI models from leading providers like Replicate, OpenAI, and ElevenLabs.

Language: TypeScript - Size: 1.31 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 4 - Forks: 2

leelabcnbc/predictive-coding-music-prediction

Code implementation for the paper "Relating Human Perception of Musicality to Prediction in a Predictive Coding Model"

Language: Python - Size: 39.1 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

SD-inst/cozyui

A frontend for ComfyUI to generate AI videos comfortably

Language: TypeScript - Size: 1.11 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 3 - Forks: 0

Agentfy-io/Agentfy_API

🤖 This is the API component of Agentify, a FastAPI-based service that provides access to specialized AI agents. Each agent is exposed through its own API endpoints, enabling modular and focused functionality.

Language: Python - Size: 50 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 3 - Forks: 0

Yazdi9/Text-To-Audio-ChatGPT

Text To Audio (Voice, Music) -Support Chat-GPT

Language: Python - Size: 3.54 MB - Last synced at: 9 months ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 1

carlosholivan/AudioGenerationDiffusion

State-of-the-art of Audio Generation with Diffusion Models

Size: 179 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

carlosholivan/audiolm-google-torch

Implementation of the AudioLM model by Google in Pytorch

Size: 420 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

Sreyan88/Synthio

Code for ICLR 2025 Paper: Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

Language: Python - Size: 2.29 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

SocAIty/socaity

SDK for generative AI.

Language: Python - Size: 24.2 MB - Last synced at: 26 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

inferless/parler-tts-streaming

text-to-speech with Server-Sent Events (SSE) streams real-time audio for chat-based applications. <metadata> gpu: A100 | collections: ["SSE Events"] </metadata>

Language: Python - Size: 9.77 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 7

inferless/pyannote-speaker-diarization-3.1

A state-of-the-art model that segments and labels audio recordings by accurately distinguishing different speakers. <metadata> gpu: T4 | collections: ["HF Transformers"] </metadata>

Language: Python - Size: 23.4 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 2

vaishanth-rmrj/open-podcraft

OpenPodcraft is an open-source project that enables users to create podcasts from their textual content. With OpenPodcraft, you can either clone your own voice or use voices from different individuals to generate professional-sounding podcasts.

Language: Python - Size: 59.3 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

radoslawregula/VoxG

Singing voice synthesizer using GANs

Language: Python - Size: 145 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 1

0x7o/DeepMozart

Audio generation using diffusion models

Language: Python - Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 2

Gmzxdotzz/Dia-TTS-Server

Self-host the powerful Dia TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), support for SafeTensors/BF16, voice cloning, dialogue generation, and GPU/CPU execution.

Language: Python - Size: 572 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

bocaletto-luca/CW-Generator

The CW (Morse) Generator is a versatile and powerful application for Morse communication enthusiasts and anyone interested in learning or practicing this classic language of communication. This software allows you to convert text to Morse code and vice versa, providing a complete suite of tools to create, interpret, and reproduce ...

Language: Python - Size: 21.5 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

inferless/musicgen-stereo-melody-large

A 3.3B parameter text-to-music model by Meta AI, fine-tuned for stereo audio generation. <metadata> gpu: A100 | collections: ["HF Transformers"] </metadata>

Language: Python - Size: 29.3 KB - Last synced at: 22 days ago - Pushed at: 23 days ago - Stars: 1 - Forks: 1

lucadellalib/bigvgan

A single-file implementation of BigVGAN generator

Language: Python - Size: 395 KB - Last synced at: 7 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

SimonBernarding/OhanashiGPT-children-story-generation

OhanashiGPT is an application that generates personalized children's stories based on parameters like age and preferences. It narrates these stories using an AI-generated voice that mimics a parent, trained on their audio samples. The app also creates illustrations to accompany each story, providing a unique and engaging experience for children.

Language: Jupyter Notebook - Size: 121 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

ashleykleynhans/stable-audio-tools-docker

Docker image for stable-audio-tools: Generative models for conditional audio generation

Language: Shell - Size: 41 KB - Last synced at: about 4 hours ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

sesac-google-ai-1st/video_factory

AI 기반으로 스크립트부터 더빙, 이미지 생성까지 all in one 영상 제작 서비스

Language: Python - Size: 62.8 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

MartianInGreen/BeeBrain

BeeBrain is your personal chatbot. Use tools, generate images, run code and so much more!

Language: Python - Size: 1.37 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

sean-e/OptForAudio

Utility to temporarily change Windows system settings for improved real time audio performance

Language: C++ - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Yuan-ManX/ai-audio-processing-methods

ai audio processing methods

Language: Python - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

mx-mark/SPMNet

Source code for "Visually aligned sound generation via sound-producing motion parsing" (Published at Neurocomputing)

Size: 4.88 KB - Last synced at: 12 months ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

inferless/hifi-gan-template

A TTS Vocoder capable of generating high fidelity speech efficiently.

Language: Python - Size: 5.98 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

inferless/melo-tts

A high-quality text-to-speech model by MyShell.ai that supports multiple English accents and real-time inference.

Language: Python - Size: 33.2 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

mahshid1378/tts-generation-webui

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)

Language: TypeScript - Size: 3.96 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

TejasMate/SmartHomeAssistDevice

A modern voice-controlled home assistant device that can play music, generate images, create audio, and more. Supports both cloud-based and local language models for enhanced flexibility and privacy.

Language: Python - Size: 27.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

aidayang/InspireMusic-OneClick

InspireMusic文本转音乐软件免安装一键启动整合包

Size: 29.3 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

filipeusz123/aud

The Aud project is a simple but powerful text-to-speech software that allows users to convert written text into spoken words. It uses advanced algorithms to provide natural-sounding speech output, making it ideal for various applications such as accessibility tools, language learning, and audiobook creation.

Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

inferless/bark-streaming

Bark text-to-speech with Server-Sent Events (SSE) streams real-time audio, providing interactive and efficient updates for chat-based applications. <metadata> gpu: A100 | collections: ["SSE Events"] </metadata>

Language: Python - Size: 45.9 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 2

ewdlop/AI-Tools

Various AI Online Tools. AI is taking over https://www.fastcompany.com/; very sus

Size: 68.4 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

blaisewf/HiFi-SAN Fork of jik876/hifi-gan

HiFi-SAN: Slicing Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Language: Python - Size: 619 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

m-faizan-mahmood/infinitnews.ai

This AI Automation Pipeline is a powerful, end-to-end solution designed to automate content creation across multiple formats, including text, images, and audio. This pipeline seamlessly integrates advanced AI technologies for text generation, image synthesis, and audio conversion, enabling efficient and high-quality content production.

Language: Jupyter Notebook - Size: 9.62 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Work-Nobu/OhanashiGPT

OhanashiGPT is an application that generates personalized children's stories based on parameters like age and preferences. It narrates these stories using an AI-generated voice that mimics a parent, trained on their audio samples. The app also creates illustrations to accompany each story, providing a unique and engaging experience for children.

Language: Jupyter Notebook - Size: 108 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

KoppAlexander/Ohanashi-ChildGPT

OhanashiGPT is an application that generates personalized children's stories based on parameters like age and preferences. It narrates these stories using an AI-generated voice that mimics a parent, trained on their audio samples. The app also creates illustrations to accompany each story, providing a unique and engaging experience for children.

Language: Jupyter Notebook - Size: 108 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 1

Topic: "audio-generation"

blaisewf/HiFi-SAN Fork of jik876/hifi-gan