GitHub topics: synthetic-dataset-generation
StacklokLabs/promptwright
Generate large synthetic data using an LLM
Language: Python - Size: 13.5 MB - Last synced at: about 15 hours ago - Pushed at: 1 day ago - Stars: 427 - Forks: 32

ImJaeSung/Synthesizers
Implementations of various synthesizers with pytorch.
Language: Python - Size: 14.7 MB - Last synced at: about 22 hours ago - Pushed at: about 24 hours ago - Stars: 1 - Forks: 0

zhenkewu/synthEHRella Fork of chenxran/synthEHRella
SynthEHRella is a benchmarking package used for evaluating synthetic Electronic Health Records (EHR) data generation methods.
Language: Python - Size: 2.31 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

mosure/bevy_zeroverse
bevy zeroverse synthetic dataset generator
Language: Rust - Size: 392 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 8 - Forks: 0

inductiva/inductiva
Large scale simulations made simple.
Language: HTML - Size: 827 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 36 - Forks: 6

DubiousCactus/hybrid-dataset-factory
A semi-synthetic dataset generation tool, specifically crafted for CNN training in drone racing.
Language: Python - Size: 17.3 MB - Last synced at: about 4 hours ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 1

remyxai/VQASynth
Compose multimodal datasets 🎹
Language: Python - Size: 17.5 MB - Last synced at: 3 days ago - Pushed at: 15 days ago - Stars: 413 - Forks: 17

magpie-align/magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
Language: Python - Size: 1.08 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 717 - Forks: 62

dieterich-lab/ASyH
The Anonymous Synthesizer for Health Data
Language: Python - Size: 635 KB - Last synced at: about 3 hours ago - Pushed at: 5 days ago - Stars: 5 - Forks: 1

sparkfish/augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Language: Python - Size: 245 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 427 - Forks: 51

nachoDRT/MERIT-Dataset
The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. This repository is actively maintained, and new features are continuously being added.
Language: Python - Size: 585 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 9 - Forks: 1

igor-olikh/syntetic-data-generator
A comprehensive toolkit for generating high-quality synthetic datasets using Meta's Llama Synthetic Data Kit. Supports PDFs, videos, documents & more for AI fine-tuning and testing.
Size: 393 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

allenmonkey970/ben10-synthetic-battles
This project builds on the Ben 10 Alien Universe Realistic Battle Dataset and adds a synthetic, expanded dataset for testing and analysis.
Language: Jupyter Notebook - Size: 3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

manishanis/eye-training
Train your eyes. Read faster.
Language: Vue - Size: 199 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

SqueezeAILab/LLM2LLM
[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Language: Python - Size: 209 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 184 - Forks: 14

synthesized-io/tdk-demo
This is a collection of TDK demo projects that use different databases and options
Language: YAML - Size: 69.4 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 17 - Forks: 4

ahmad-alismail/LLM_based_Synthetic_Data_Generation
A curated and continuously updated collection of papers, tools, and datasets on synthetic data generation using LLMs and agentic workflows.
Size: 28.3 KB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

georgeoshardo/SyMBac
Accurate segmentation of bacterial microscope images using deep learning synthetically generated image data.
Language: Jupyter Notebook - Size: 158 MB - Last synced at: about 19 hours ago - Pushed at: about 21 hours ago - Stars: 19 - Forks: 9

openlayer-ai/openlayer-python
The official Python library for Openlayer, the Continuous Model Improvement Platform for AI. 📈
Language: Python - Size: 21.6 MB - Last synced at: about 22 hours ago - Pushed at: about 24 hours ago - Stars: 12 - Forks: 1

naomibaes/LSCD_method_evaluation
Companion repository with scripts for applying LSC-Eval, a 3-stage evaluation framework to: (1) create theory-driven LLM-generated synthetic suites for LSC dimensions, (2) program experimental settings for comparative method evaluation on a synthetic change detection task, (3) choose the most suitable method for the dimension and domain of interest
Language: Jupyter Notebook - Size: 582 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

naomibaes/Synthetic-LSC_pipeline
Synthetic datasets to evaluate key dimensions of LSC (Sentiment, Intensity, Breadth), generated using LLMs and WordNet from the LSC-Eval framework.
Language: Jupyter Notebook - Size: 31.5 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

datadreamer-dev/DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤
Language: Python - Size: 895 KB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 1,026 - Forks: 54

argilla-io/distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Language: Python - Size: 554 MB - Last synced at: 8 days ago - Pushed at: 15 days ago - Stars: 2,755 - Forks: 205

bespokelabsai/curator
Synthetic data curation for post-training and structured data extraction
Language: Python - Size: 62.6 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1,391 - Forks: 109

RQLabsAI/SyntheticGenAgent
Generuj nieskończony i zdywersyfikowany zbiór danych przy użyciu systemu agentowego!
Language: Python - Size: 122 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

sfu-mial/DermSynth3D
Official code for "DermSynth3D: Synthesis of in-the-wild Annotated Dermatology Images". A data generation pipeline for creating photorealistic in-the-wild synthetic dermatalogical data with rich multi-task annotations for various skin-analysis tasks.
Language: Jupyter Notebook - Size: 287 MB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 27 - Forks: 4

BiodataAnalysisGroup/synth4bench Fork of sfragkoul/synth4bench
A framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms.
Language: R - Size: 242 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 3 - Forks: 1

davanstrien/awesome-synthetic-datasets
awesome synthetic (text) datasets
Language: Jupyter Notebook - Size: 184 KB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 282 - Forks: 11

mcdaqc/vulnerability-intelligence-diagrammatic-reasoning
Vulnerability Intelligence with Diagrammatic Reasoning
Language: Python - Size: 1.68 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

KodCode-AI/kodcode
✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork
Language: Python - Size: 40.6 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 229 - Forks: 10

BBahtiri/Space-Filling-Algorithm-Data-Generation-Technique
A space-filling procedure to generate data from a constitutive model (viscoelastic-viscoplastic-damage) including moisture, strain rate, and nanoparticle volume fraction dependency.
Language: MATLAB - Size: 78.1 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 2 - Forks: 2

Travvy88/DocumentGenerator_DoGe
Synthetic Document Generator for Document AI. Creates document images annotated with text and bounding boxes of each word. Images contain headings, tables, paragraphs with different formatting and fonts. Can be used in OCR, document transformers pretraining, text detection and more other tasks.
Language: Python - Size: 22.3 MB - Last synced at: 3 days ago - Pushed at: 20 days ago - Stars: 21 - Forks: 2

MultiTonic/thinking-dataset
Creating a Thinking Dataset: Leveraging Real-World Data for Strategic Business Insights and STaR Case Study Generation.
Language: Python - Size: 36.1 MB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 9 - Forks: 5

mirpo/datamatic
Generate synthetic datasets using local LLMs via Ollama and LMstudio with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other major language models.
Language: Go - Size: 79.1 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

tabularis-ai/be_great
A novel approach for synthesizing tabular data using pretrained large language models
Language: Python - Size: 4.29 MB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 312 - Forks: 52

NVIDIA/Dataset_Synthesizer
NVIDIA Deep learning Dataset Synthesizer (NDDS)
Language: C++ - Size: 6.18 MB - Last synced at: 3 days ago - Pushed at: over 4 years ago - Stars: 582 - Forks: 132

PrasannaPulakurthi/EHAR-GAN
Enhancing Human Action Recognition with GAN-based Data Augmentation
Language: Python - Size: 19.6 MB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 3 - Forks: 0

clugen/pyclugen
Multidimensional cluster generation in Python
Language: Python - Size: 21 MB - Last synced at: 16 days ago - Pushed at: 9 months ago - Stars: 9 - Forks: 0

clugen/clugenr
Multidimensional cluster generation in R
Language: R - Size: 37.7 MB - Last synced at: 6 days ago - Pushed at: 11 months ago - Stars: 6 - Forks: 0

jknafou/TransCorpus
TransCorpus is a scalable toolkit for large-scale, parallel translation and preprocessing of text corpora, built for language model pretraining and research.
Language: Python - Size: 5.91 MB - Last synced at: 23 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

worldbank/REaLTabFormer
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
Language: Jupyter Notebook - Size: 12.3 MB - Last synced at: 20 days ago - Pushed at: 24 days ago - Stars: 228 - Forks: 28

isaultirado77/optical_aberration_database
Dataset + algoritmos para simulación de aberraciones ópticas usando polinomios de Zernike.
Language: Python - Size: 201 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

starfishdata/starfish
Synthetic data generation to fuel AI models
Language: Python - Size: 14 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 30 - Forks: 1

tempo-sim/Tempo
The Tempo Unreal Engine plugins
Language: C++ - Size: 6.76 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 15 - Forks: 5

SherAndrei/blender-gen-dataset
Generate synthetic datasets with Blender
Language: Python - Size: 2.6 MB - Last synced at: 15 days ago - Pushed at: 26 days ago - Stars: 0 - Forks: 0

firmai/mtss-gan 📦
MTSS-GAN: Multivariate Time Series Simulation with Generative Adversarial Networks (by @firmai)
Size: 3.62 MB - Last synced at: 6 days ago - Pushed at: over 4 years ago - Stars: 94 - Forks: 30

ucl-cssb/MIMIC
Modelling and Inference of MICrobiomes Project (MIMIC) is a Python package dedicated to simulate, model, and predict microbial communities interactions
Language: Python - Size: 480 MB - Last synced at: 25 days ago - Pushed at: 26 days ago - Stars: 7 - Forks: 0

Unity-Technologies/PeopleSansPeople
Unity's privacy-preserving human-centric synthetic data generator
Language: C# - Size: 446 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 311 - Forks: 35

Clearbox-AI/clearbox-synthetic-kit
Clearbox AI's all-in-one solution for generation and evaluation of synthetic tabular and time-series data.
Language: Python - Size: 5.01 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 43 - Forks: 1

AstraBert/diRAGnosis
Diagnose the performance of your RAG🩺
Language: Python - Size: 214 KB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 36 - Forks: 3

Eladlev/AutoPrompt
A framework for prompt tuning using Intent-based Prompt Calibration
Language: Python - Size: 26 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2,520 - Forks: 213

dariant/ID-Booth
Official repository of the paper: "ID-Booth: Identity-consistent Face Generation with Diffusion Models"
Language: Jupyter Notebook - Size: 4.67 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 17 - Forks: 4

Unity-Technologies/com.unity.perception
Perception toolkit for sim2real training and validation in Unity
Language: C# - Size: 320 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 962 - Forks: 177

nicolas-hbt/pygraft
Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
Language: Python - Size: 699 KB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 684 - Forks: 45

Shekswess/synthgenai
SynthGenAI - Package for Generating Synthetic Datasets using LLMs.
Language: Python - Size: 1.64 MB - Last synced at: 26 days ago - Pushed at: 5 months ago - Stars: 37 - Forks: 4

YJiangcm/WebR
[ACL 2025] Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Language: Python - Size: 4.38 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 3

BatsResearch/bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
Language: Python - Size: 796 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 774 - Forks: 48

intel/polite-guard
Source code for Intel's Polite Guard NLP project
Language: Python - Size: 850 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 33 - Forks: 4

dmeldrum6/LLMDatasetBuilder
LLM-Powered Dataset Creation Tool
Language: HTML - Size: 44.9 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

firmai/datagene
DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)
Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: 6 days ago - Pushed at: over 3 years ago - Stars: 204 - Forks: 24

nupurkmr9/syncd
SynCD: Generating Multi-Image Synthetic Data for Text-to-Image Customization
Language: Python - Size: 25 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 132 - Forks: 13

fjxmlzn/DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
Language: Python - Size: 67.4 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 307 - Forks: 74

covisionlab/diffusion_labeling
Official implementation of "Bounding Box-Guided Diffusion for Synthesizing Industrial Images and Segmentation Map" accepted at Synthetic Data for Computer Vision Workshop - CVPR 2025
Language: Python - Size: 71.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

zhenzhiwang/HumanVid
[NeurIPS D&B Track 2024] Official implementation of HumanVid
Language: Python - Size: 845 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 306 - Forks: 5

firstbatchxyz/dria-sdk
Dria SDK is for building and executing synthetic data generation pipelines on Dria Knowledge Network.
Language: Python - Size: 2.64 MB - Last synced at: 20 days ago - Pushed at: 3 months ago - Stars: 24 - Forks: 6

maxvandenhoven/blenderline
A Blender pipeline for generating synthetic images of production lines
Language: Python - Size: 195 MB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 26 - Forks: 1

motagfr/Master-s-Thesis
Repository for my Master Thesis on Gossiping Protocols and Information Propagation. Includes mathematical models, simulations, and applications to study decentralized systems and optimize information dissemination.
Language: HTML - Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

DubiousCactus/autonomous-drone-racing
This repository contains the code for the paper "Image Generation for Efficient Neural Network Training in Autonomous Drone Racing" of the WCCI 2020 congress.
Language: Python - Size: 198 MB - Last synced at: about 4 hours ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 6

NVIDIA/Dataset_Utilities
NVIDIA Dataset Utilities (NVDU)
Language: Python - Size: 149 KB - Last synced at: 3 days ago - Pushed at: about 4 years ago - Stars: 130 - Forks: 21

Tuebel/BlenderProc.DissTimRedick Fork of rwth-irt/BlenderProc.DissTimRedick
BlenderProc setup to generate the synthetic datasets from Tim Redick's dissertation. STERI models not included since the CAD files are proprietary.
Size: 35.2 KB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

astorfi/cor-gan
:unlock: COR-GAN: Correlation-Capturing Convolutional Neural Networks for Generating Synthetic Healthcare Records
Language: Python - Size: 55.9 MB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 57 - Forks: 12

privateai/deid-examples
Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.
Language: Jupyter Notebook - Size: 37.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 81 - Forks: 1

clugen/MOCluGen
Multidimensional cluster generation in MATLAB/Octave
Language: MATLAB - Size: 11.3 MB - Last synced at: 26 days ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

mantyni/Multi-object-detection-lego
Multi object detection of lego bricks in a dataset generated using using blender.
Language: Jupyter Notebook - Size: 146 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 15 - Forks: 2

csiro-robotics/UPGen
The official repository for the paper: Scalable learning for bridging the species gap in image-based plant phenotyping.
Language: Python - Size: 15 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 25 - Forks: 3

ReverendBayes/Local-Differential-Privacy-Synthetic-Data-Generator
A single-file CLI that generates privacy-preserving synthetic CSVs via local differential privacy (Laplace & randomized response).
Language: Python - Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

MuhangTian/TimeDiff
Code to generate realistic synthetic healthcare data with diffusion models
Language: Jupyter Notebook - Size: 13.9 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 16 - Forks: 4

sej2020/Diffusion-TS-Storage Fork of ermongroup/CSDI
Metadata-conditional diffusion model for flexible time-series generation. Model + Analysis
Language: Python - Size: 36.4 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

tech-aakash/Public-Health-Disease-Surveillance
This repository highlights course work completed during Population Health Informatics course in Spring 2025. It is a comprehensive part of final project submission.
Language: Jupyter Notebook - Size: 523 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

diffix/syndiffix
Python implementation of the SynDiffix synthetic data generation mechanism.
Language: Python - Size: 610 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 8 - Forks: 2

sfragkoul/synth4bench
A framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms.
Language: R - Size: 241 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 2

junayed-hasan/occupational-stress-ml
This repository contains code, datasets, and analysis for AI-driven occupational stress detection using machine learning, deep learning, and NLP. It includes feature selection, explainable AI, synthetic data generation, and model validation for workplace safety applications. 🚀
Language: Jupyter Notebook - Size: 17.6 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 1

isarandi/synthetic-occlusion
Synthetic Occlusion Augmentation
Language: Python - Size: 373 KB - Last synced at: 2 months ago - Pushed at: over 5 years ago - Stars: 121 - Forks: 19

firstbatchxyz/pythonic-function-calling-data
Pythonic Function Calling Dataset Generator w/ Dria
Language: Python - Size: 24.4 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

Unity-Technologies/SynthDet 📦
SynthDet - An end-to-end object detection pipeline using synthetic data
Language: C# - Size: 2.19 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 373 - Forks: 55

GDelCorso/NA_DAtabase
NA_DA is an open-source software written in Python that generates datasets of regular two-dimensional geometric shapes based on probabilistic distributions. NA_DA comes with an intuitive GUI (Graphical User Interface) that allows users to define shapes, colors, and distributions of features.
Language: Python - Size: 38.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

ethicalabs-ai/ouroboros
Self-Improving LLMs Through Iterative Refinement
Language: Python - Size: 429 KB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

VinAIResearch/Dataset-Diffusion
Dataset Diffusion: Diffusion-based Synthetic Data Generation for Pixel-Level Semantic Segmentation (NeurIPS2023)
Language: Jupyter Notebook - Size: 7.78 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 116 - Forks: 4

rozumden/DeFMO
[CVPR 2021] DeFMO: Deblurring and Shape Recovery of Fast Moving Objects
Language: Python - Size: 2.1 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 172 - Forks: 24

arkanivasarkar/EEG-Data-Augmentation-using-Variational-Autoencoder
Improving performance of motor imagery classification using variational-autoencoder and synthetic EEG signals
Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 42 - Forks: 9

POSE-Lab/6DL-PoseGenerator
Language: Python - Size: 53.4 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

stefanrmmr/differentially_private_synthetic_data
Differentially Private Synthetic Data Generation [DP-SDG] - Experimental Setups & Knowledge Base - WORK IN PROGRESS
Language: Jupyter Notebook - Size: 5.23 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 12 - Forks: 2

thalesbertaglia/instasynth
Synthetic Instagram Post Generation for Social Media Research
Language: Jupyter Notebook - Size: 737 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

tirthajyoti/pydbgen
Random dataframe and database table generator
Language: Python - Size: 687 KB - Last synced at: 24 days ago - Pushed at: about 4 years ago - Stars: 309 - Forks: 58

leSullivan/unpaired_image_synthesis_with_gans
Implementation of multiple GAN architectures (CGAN, CycleGAN, TurboCycleGAN) for unpaired image-to-image translation, specifically focused on synthetic fence generation in landscape images. Built with PyTorch Lightning and includes SLURM integration for HPC training.
Language: Jupyter Notebook - Size: 27.4 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

majsylw/microbial-counting-review
A list of useful resources in the microbial colony classification and detection, such as datasets, papers, links to projects
Size: 15.6 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 2

josericodata/SyntheticDataGeneratorApp
Generate and download free synthetic datasets instantly! A Streamlit app with built-in statistical validation tools like Chi-Square and Mutual Information.
Language: Python - Size: 7.41 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

HROlive/Unreal-Engine-for-Remote-Visualization-and-Machine-Learning
In-depth training to using Unreal Engine as a data generator and integrat it in a simple ML workflow, in one of the leading supercomputing centres.
Language: C# - Size: 874 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

ALucek/QuicKB
Optimize Document Retrieval with Fine-Tuned KnowledgeBases
Language: Python - Size: 1.63 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 107 - Forks: 21

RubixML/Colors
Demonstrating unsupervised clustering using the K Means algorithm and synthetic color data.
Language: PHP - Size: 251 KB - Last synced at: 6 days ago - Pushed at: about 3 years ago - Stars: 18 - Forks: 3
