An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: synthetic-dataset-generation

StacklokLabs/promptwright

Generate large synthetic data using an LLM

Language: Python - Size: 13.5 MB - Last synced at: about 15 hours ago - Pushed at: 1 day ago - Stars: 427 - Forks: 32

ImJaeSung/Synthesizers

Implementations of various synthesizers with pytorch.

Language: Python - Size: 14.7 MB - Last synced at: about 22 hours ago - Pushed at: about 24 hours ago - Stars: 1 - Forks: 0

zhenkewu/synthEHRella Fork of chenxran/synthEHRella

SynthEHRella is a benchmarking package used for evaluating synthetic Electronic Health Records (EHR) data generation methods.

Language: Python - Size: 2.31 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

mosure/bevy_zeroverse

bevy zeroverse synthetic dataset generator

Language: Rust - Size: 392 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 8 - Forks: 0

inductiva/inductiva

Large scale simulations made simple.

Language: HTML - Size: 827 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 36 - Forks: 6

DubiousCactus/hybrid-dataset-factory

A semi-synthetic dataset generation tool, specifically crafted for CNN training in drone racing.

Language: Python - Size: 17.3 MB - Last synced at: about 4 hours ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 1

remyxai/VQASynth

Compose multimodal datasets 🎹

Language: Python - Size: 17.5 MB - Last synced at: 3 days ago - Pushed at: 15 days ago - Stars: 413 - Forks: 17

magpie-align/magpie

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!

Language: Python - Size: 1.08 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 717 - Forks: 62

dieterich-lab/ASyH

The Anonymous Synthesizer for Health Data

Language: Python - Size: 635 KB - Last synced at: about 3 hours ago - Pushed at: 5 days ago - Stars: 5 - Forks: 1

sparkfish/augraphy

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Language: Python - Size: 245 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 427 - Forks: 51

nachoDRT/MERIT-Dataset

The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. This repository is actively maintained, and new features are continuously being added.

Language: Python - Size: 585 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 9 - Forks: 1

igor-olikh/syntetic-data-generator

A comprehensive toolkit for generating high-quality synthetic datasets using Meta's Llama Synthetic Data Kit. Supports PDFs, videos, documents & more for AI fine-tuning and testing.

Size: 393 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

allenmonkey970/ben10-synthetic-battles

This project builds on the Ben 10 Alien Universe Realistic Battle Dataset and adds a synthetic, expanded dataset for testing and analysis.

Language: Jupyter Notebook - Size: 3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

manishanis/eye-training

Train your eyes. Read faster.

Language: Vue - Size: 199 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

SqueezeAILab/LLM2LLM

[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Language: Python - Size: 209 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 184 - Forks: 14

synthesized-io/tdk-demo

This is a collection of TDK demo projects that use different databases and options

Language: YAML - Size: 69.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 17 - Forks: 4

ahmad-alismail/LLM_based_Synthetic_Data_Generation

A curated and continuously updated collection of papers, tools, and datasets on synthetic data generation using LLMs and agentic workflows.

Size: 28.3 KB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

georgeoshardo/SyMBac

Accurate segmentation of bacterial microscope images using deep learning synthetically generated image data.

Language: Jupyter Notebook - Size: 158 MB - Last synced at: about 19 hours ago - Pushed at: about 21 hours ago - Stars: 19 - Forks: 9

openlayer-ai/openlayer-python

The official Python library for Openlayer, the Continuous Model Improvement Platform for AI. 📈

Language: Python - Size: 21.6 MB - Last synced at: about 22 hours ago - Pushed at: about 24 hours ago - Stars: 12 - Forks: 1

naomibaes/LSCD_method_evaluation

Companion repository with scripts for applying LSC-Eval, a 3-stage evaluation framework to: (1) create theory-driven LLM-generated synthetic suites for LSC dimensions, (2) program experimental settings for comparative method evaluation on a synthetic change detection task, (3) choose the most suitable method for the dimension and domain of interest

Language: Jupyter Notebook - Size: 582 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

naomibaes/Synthetic-LSC_pipeline

Synthetic datasets to evaluate key dimensions of LSC (Sentiment, Intensity, Breadth), generated using LLMs and WordNet from the LSC-Eval framework.

Language: Jupyter Notebook - Size: 31.5 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

datadreamer-dev/DataDreamer

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤

Language: Python - Size: 895 KB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 1,026 - Forks: 54

argilla-io/distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Language: Python - Size: 554 MB - Last synced at: 8 days ago - Pushed at: 15 days ago - Stars: 2,755 - Forks: 205

bespokelabsai/curator

Synthetic data curation for post-training and structured data extraction

Language: Python - Size: 62.6 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1,391 - Forks: 109

RQLabsAI/SyntheticGenAgent

Generuj nieskończony i zdywersyfikowany zbiór danych przy użyciu systemu agentowego!

Language: Python - Size: 122 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

sfu-mial/DermSynth3D

Official code for "DermSynth3D: Synthesis of in-the-wild Annotated Dermatology Images". A data generation pipeline for creating photorealistic in-the-wild synthetic dermatalogical data with rich multi-task annotations for various skin-analysis tasks.

Language: Jupyter Notebook - Size: 287 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 27 - Forks: 4

BiodataAnalysisGroup/synth4bench Fork of sfragkoul/synth4bench

A framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms.

Language: R - Size: 242 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 3 - Forks: 1

davanstrien/awesome-synthetic-datasets

awesome synthetic (text) datasets

Language: Jupyter Notebook - Size: 184 KB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 282 - Forks: 11

mcdaqc/vulnerability-intelligence-diagrammatic-reasoning

Vulnerability Intelligence with Diagrammatic Reasoning

Language: Python - Size: 1.68 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

KodCode-AI/kodcode

✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork

Language: Python - Size: 40.6 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 229 - Forks: 10

BBahtiri/Space-Filling-Algorithm-Data-Generation-Technique

A space-filling procedure to generate data from a constitutive model (viscoelastic-viscoplastic-damage) including moisture, strain rate, and nanoparticle volume fraction dependency.

Language: MATLAB - Size: 78.1 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 2 - Forks: 2

Travvy88/DocumentGenerator_DoGe

Synthetic Document Generator for Document AI. Creates document images annotated with text and bounding boxes of each word. Images contain headings, tables, paragraphs with different formatting and fonts. Can be used in OCR, document transformers pretraining, text detection and more other tasks.

Language: Python - Size: 22.3 MB - Last synced at: 3 days ago - Pushed at: 20 days ago - Stars: 21 - Forks: 2

MultiTonic/thinking-dataset

Creating a Thinking Dataset: Leveraging Real-World Data for Strategic Business Insights and STaR Case Study Generation.

Language: Python - Size: 36.1 MB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 9 - Forks: 5

mirpo/datamatic

Generate synthetic datasets using local LLMs via Ollama and LMstudio with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other major language models.

Language: Go - Size: 79.1 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

tabularis-ai/be_great

A novel approach for synthesizing tabular data using pretrained large language models

Language: Python - Size: 4.29 MB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 312 - Forks: 52

NVIDIA/Dataset_Synthesizer

NVIDIA Deep learning Dataset Synthesizer (NDDS)

Language: C++ - Size: 6.18 MB - Last synced at: 3 days ago - Pushed at: over 4 years ago - Stars: 582 - Forks: 132

PrasannaPulakurthi/EHAR-GAN

Enhancing Human Action Recognition with GAN-based Data Augmentation

Language: Python - Size: 19.6 MB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 3 - Forks: 0

clugen/pyclugen

Multidimensional cluster generation in Python

Language: Python - Size: 21 MB - Last synced at: 16 days ago - Pushed at: 9 months ago - Stars: 9 - Forks: 0

clugen/clugenr

Multidimensional cluster generation in R

Language: R - Size: 37.7 MB - Last synced at: 6 days ago - Pushed at: 11 months ago - Stars: 6 - Forks: 0

jknafou/TransCorpus

TransCorpus is a scalable toolkit for large-scale, parallel translation and preprocessing of text corpora, built for language model pretraining and research.

Language: Python - Size: 5.91 MB - Last synced at: 23 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

worldbank/REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

Language: Jupyter Notebook - Size: 12.3 MB - Last synced at: 20 days ago - Pushed at: 24 days ago - Stars: 228 - Forks: 28

isaultirado77/optical_aberration_database

Dataset + algoritmos para simulación de aberraciones ópticas usando polinomios de Zernike.

Language: Python - Size: 201 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

starfishdata/starfish

Synthetic data generation to fuel AI models

Language: Python - Size: 14 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 30 - Forks: 1

tempo-sim/Tempo

The Tempo Unreal Engine plugins

Language: C++ - Size: 6.76 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 15 - Forks: 5

SherAndrei/blender-gen-dataset

Generate synthetic datasets with Blender

Language: Python - Size: 2.6 MB - Last synced at: 15 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

firmai/mtss-gan 📦

MTSS-GAN: Multivariate Time Series Simulation with Generative Adversarial Networks (by @firmai)

Size: 3.62 MB - Last synced at: 6 days ago - Pushed at: over 4 years ago - Stars: 94 - Forks: 30

ucl-cssb/MIMIC

Modelling and Inference of MICrobiomes Project (MIMIC) is a Python package dedicated to simulate, model, and predict microbial communities interactions

Language: Python - Size: 480 MB - Last synced at: 25 days ago - Pushed at: 26 days ago - Stars: 7 - Forks: 0

Unity-Technologies/PeopleSansPeople

Unity's privacy-preserving human-centric synthetic data generator

Language: C# - Size: 446 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 311 - Forks: 35

Clearbox-AI/clearbox-synthetic-kit

Clearbox AI's all-in-one solution for generation and evaluation of synthetic tabular and time-series data.

Language: Python - Size: 5.01 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 43 - Forks: 1

AstraBert/diRAGnosis

Diagnose the performance of your RAG🩺

Language: Python - Size: 214 KB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 36 - Forks: 3

Eladlev/AutoPrompt

A framework for prompt tuning using Intent-based Prompt Calibration

Language: Python - Size: 26 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2,520 - Forks: 213

dariant/ID-Booth

Official repository of the paper: "ID-Booth: Identity-consistent Face Generation with Diffusion Models"

Language: Jupyter Notebook - Size: 4.67 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 17 - Forks: 4

Unity-Technologies/com.unity.perception

Perception toolkit for sim2real training and validation in Unity

Language: C# - Size: 320 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 962 - Forks: 177

nicolas-hbt/pygraft

Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips

Language: Python - Size: 699 KB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 684 - Forks: 45

Shekswess/synthgenai

SynthGenAI - Package for Generating Synthetic Datasets using LLMs.

Language: Python - Size: 1.64 MB - Last synced at: 26 days ago - Pushed at: 5 months ago - Stars: 37 - Forks: 4

YJiangcm/WebR

[ACL 2025] Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction

Language: Python - Size: 4.38 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 3

BatsResearch/bonito

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

Language: Python - Size: 796 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 774 - Forks: 48

intel/polite-guard

Source code for Intel's Polite Guard NLP project

Language: Python - Size: 850 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 33 - Forks: 4

dmeldrum6/LLMDatasetBuilder

LLM-Powered Dataset Creation Tool

Language: HTML - Size: 44.9 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

firmai/datagene

DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)

Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: 6 days ago - Pushed at: over 3 years ago - Stars: 204 - Forks: 24

nupurkmr9/syncd

SynCD: Generating Multi-Image Synthetic Data for Text-to-Image Customization

Language: Python - Size: 25 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 132 - Forks: 13

fjxmlzn/DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

Language: Python - Size: 67.4 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 307 - Forks: 74

covisionlab/diffusion_labeling

Official implementation of "Bounding Box-Guided Diffusion for Synthesizing Industrial Images and Segmentation Map" accepted at Synthetic Data for Computer Vision Workshop - CVPR 2025

Language: Python - Size: 71.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

zhenzhiwang/HumanVid

[NeurIPS D&B Track 2024] Official implementation of HumanVid

Language: Python - Size: 845 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 306 - Forks: 5

firstbatchxyz/dria-sdk

Dria SDK is for building and executing synthetic data generation pipelines on Dria Knowledge Network.

Language: Python - Size: 2.64 MB - Last synced at: 20 days ago - Pushed at: 3 months ago - Stars: 24 - Forks: 6

maxvandenhoven/blenderline

A Blender pipeline for generating synthetic images of production lines

Language: Python - Size: 195 MB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 26 - Forks: 1

motagfr/Master-s-Thesis

Repository for my Master Thesis on Gossiping Protocols and Information Propagation. Includes mathematical models, simulations, and applications to study decentralized systems and optimize information dissemination.

Language: HTML - Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

DubiousCactus/autonomous-drone-racing

This repository contains the code for the paper "Image Generation for Efficient Neural Network Training in Autonomous Drone Racing" of the WCCI 2020 congress.

Language: Python - Size: 198 MB - Last synced at: about 4 hours ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 6

NVIDIA/Dataset_Utilities

NVIDIA Dataset Utilities (NVDU)

Language: Python - Size: 149 KB - Last synced at: 3 days ago - Pushed at: about 4 years ago - Stars: 130 - Forks: 21

Tuebel/BlenderProc.DissTimRedick Fork of rwth-irt/BlenderProc.DissTimRedick

BlenderProc setup to generate the synthetic datasets from Tim Redick's dissertation. STERI models not included since the CAD files are proprietary.

Size: 35.2 KB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

astorfi/cor-gan

:unlock: COR-GAN: Correlation-Capturing Convolutional Neural Networks for Generating Synthetic Healthcare Records

Language: Python - Size: 55.9 MB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 57 - Forks: 12

privateai/deid-examples

Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.

Language: Jupyter Notebook - Size: 37.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 81 - Forks: 1

clugen/MOCluGen

Multidimensional cluster generation in MATLAB/Octave

Language: MATLAB - Size: 11.3 MB - Last synced at: 26 days ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

mantyni/Multi-object-detection-lego

Multi object detection of lego bricks in a dataset generated using using blender.

Language: Jupyter Notebook - Size: 146 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 15 - Forks: 2

csiro-robotics/UPGen

The official repository for the paper: Scalable learning for bridging the species gap in image-based plant phenotyping.

Language: Python - Size: 15 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 25 - Forks: 3

ReverendBayes/Local-Differential-Privacy-Synthetic-Data-Generator

A single-file CLI that generates privacy-preserving synthetic CSVs via local differential privacy (Laplace & randomized response).

Language: Python - Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

MuhangTian/TimeDiff

Code to generate realistic synthetic healthcare data with diffusion models

Language: Jupyter Notebook - Size: 13.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 16 - Forks: 4

sej2020/Diffusion-TS-Storage Fork of ermongroup/CSDI

Metadata-conditional diffusion model for flexible time-series generation. Model + Analysis

Language: Python - Size: 36.4 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

tech-aakash/Public-Health-Disease-Surveillance

This repository highlights course work completed during Population Health Informatics course in Spring 2025. It is a comprehensive part of final project submission.

Language: Jupyter Notebook - Size: 523 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

diffix/syndiffix

Python implementation of the SynDiffix synthetic data generation mechanism.

Language: Python - Size: 610 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 8 - Forks: 2

sfragkoul/synth4bench

A framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms.

Language: R - Size: 241 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 2

junayed-hasan/occupational-stress-ml

This repository contains code, datasets, and analysis for AI-driven occupational stress detection using machine learning, deep learning, and NLP. It includes feature selection, explainable AI, synthetic data generation, and model validation for workplace safety applications. 🚀

Language: Jupyter Notebook - Size: 17.6 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 1

isarandi/synthetic-occlusion

Synthetic Occlusion Augmentation

Language: Python - Size: 373 KB - Last synced at: 2 months ago - Pushed at: over 5 years ago - Stars: 121 - Forks: 19

firstbatchxyz/pythonic-function-calling-data

Pythonic Function Calling Dataset Generator w/ Dria

Language: Python - Size: 24.4 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

Unity-Technologies/SynthDet 📦

SynthDet - An end-to-end object detection pipeline using synthetic data

Language: C# - Size: 2.19 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 373 - Forks: 55

GDelCorso/NA_DAtabase

NA_DA is an open-source software written in Python that generates datasets of regular two-dimensional geometric shapes based on probabilistic distributions. NA_DA comes with an intuitive GUI (Graphical User Interface) that allows users to define shapes, colors, and distributions of features.

Language: Python - Size: 38.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

ethicalabs-ai/ouroboros

Self-Improving LLMs Through Iterative Refinement

Language: Python - Size: 429 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

VinAIResearch/Dataset-Diffusion

Dataset Diffusion: Diffusion-based Synthetic Data Generation for Pixel-Level Semantic Segmentation (NeurIPS2023)

Language: Jupyter Notebook - Size: 7.78 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 116 - Forks: 4

rozumden/DeFMO

[CVPR 2021] DeFMO: Deblurring and Shape Recovery of Fast Moving Objects

Language: Python - Size: 2.1 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 172 - Forks: 24

arkanivasarkar/EEG-Data-Augmentation-using-Variational-Autoencoder

Improving performance of motor imagery classification using variational-autoencoder and synthetic EEG signals

Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 42 - Forks: 9

POSE-Lab/6DL-PoseGenerator

Language: Python - Size: 53.4 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

stefanrmmr/differentially_private_synthetic_data

Differentially Private Synthetic Data Generation [DP-SDG] - Experimental Setups & Knowledge Base - WORK IN PROGRESS

Language: Jupyter Notebook - Size: 5.23 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 12 - Forks: 2

thalesbertaglia/instasynth

Synthetic Instagram Post Generation for Social Media Research

Language: Jupyter Notebook - Size: 737 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

tirthajyoti/pydbgen

Random dataframe and database table generator

Language: Python - Size: 687 KB - Last synced at: 24 days ago - Pushed at: about 4 years ago - Stars: 309 - Forks: 58

leSullivan/unpaired_image_synthesis_with_gans

Implementation of multiple GAN architectures (CGAN, CycleGAN, TurboCycleGAN) for unpaired image-to-image translation, specifically focused on synthetic fence generation in landscape images. Built with PyTorch Lightning and includes SLURM integration for HPC training.

Language: Jupyter Notebook - Size: 27.4 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

majsylw/microbial-counting-review

A list of useful resources in the microbial colony classification and detection, such as datasets, papers, links to projects

Size: 15.6 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 2

josericodata/SyntheticDataGeneratorApp

Generate and download free synthetic datasets instantly! A Streamlit app with built-in statistical validation tools like Chi-Square and Mutual Information.

Language: Python - Size: 7.41 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

HROlive/Unreal-Engine-for-Remote-Visualization-and-Machine-Learning

In-depth training to using Unreal Engine as a data generator and integrat it in a simple ML workflow, in one of the leading supercomputing centres.

Language: C# - Size: 874 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

ALucek/QuicKB

Optimize Document Retrieval with Fine-Tuned KnowledgeBases

Language: Python - Size: 1.63 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 107 - Forks: 21

RubixML/Colors

Demonstrating unsupervised clustering using the K Means algorithm and synthetic color data.

Language: PHP - Size: 251 KB - Last synced at: 6 days ago - Pushed at: about 3 years ago - Stars: 18 - Forks: 3

Related Keywords
synthetic-dataset-generation 293 synthetic-data 120 python 45 deep-learning 41 machine-learning 40 computer-vision 31 llm 23 dataset 22 object-detection 16 llms 14 dataset-generation 13 generative-adversarial-network 13 synthetic-data-generation 13 data-augmentation 13 nlp 12 gan 11 generative-ai 10 pose-estimation 10 blender 10 data-generation 9 large-language-models 9 pytorch 9 gans 9 data-science 9 ai 9 diffusion-models 8 synthetic-data-generator 8 tabular-data 8 datasets 7 synthetic 7 privacy 7 benchmarking 7 fine-tuning 7 time-series 7 synthetic-images 6 classification 6 deep-neural-networks 6 domain-randomization 6 natural-language-processing 6 openai 6 huggingface 6 deeplearning 5 vae 5 unity3d 5 transfer-learning 5 privacy-enhancing-technologies 5 artificial-intelligence 5 image-processing 5 privacy-tools 5 transformers 5 unreal-engine-5 5 research 5 evaluation-framework 5 tensorflow 5 domain-adaptation 5 instance-segmentation 4 unity 4 differential-privacy 4 data-generator 4 gpt 4 chatgpt 4 yolov5 4 perception 4 pytorch-lightning 4 instruction-tuning 4 simulation 4 synthetic-clusters 4 numpy 4 multidimensional-data 4 jupyter-notebook 4 finance 4 clustering 4 segmentation 4 multidimensional-clusters 4 anonymization 4 ocr 4 llm-training 3 hpc 3 transformer 3 blender-python 3 tensorflow2 3 computer-graphics 3 healthcare 3 microscopy 3 network 3 semantic-segmentation 3 dummy-data-generator 3 generative-model 3 r 3 paper 3 fake-data 3 medical-imaging 3 3d 3 ros 3 synthetic-datasets 3 synthetic-dataset 3 supervised-finetuning 3 robotics 3 alignment 3 ctgan 3