An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: synthetic-data

Francis-Calingo/Canadian-Rental-Prices-and-Immigration-ML-Predictive-Model

Language: Jupyter Notebook - Size: 8.26 MB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 0 - Forks: 1

mostly-ai/mostlyai

Synthetic Data SDK โœจ

Language: Python - Size: 14.1 MB - Last synced at: about 23 hours ago - Pushed at: about 23 hours ago - Stars: 569 - Forks: 45

mostly-ai/mostlyai-mock

Synthetic Data as You See Fit ๐Ÿ”ฎ

Language: Python - Size: 716 KB - Last synced at: about 24 hours ago - Pushed at: 1 day ago - Stars: 6 - Forks: 2

expectedparrot/edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

Language: Python - Size: 124 MB - Last synced at: about 24 hours ago - Pushed at: 1 day ago - Stars: 252 - Forks: 25

lk-geimfari/mimesis

Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.

Language: Python - Size: 33.8 MB - Last synced at: about 13 hours ago - Pushed at: about 1 month ago - Stars: 4,589 - Forks: 341

sparkfish/augraphy

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Language: Python - Size: 245 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 427 - Forks: 51

igor-olikh/syntetic-data-generator

A comprehensive toolkit for generating high-quality synthetic datasets using Meta's Llama Synthetic Data Kit. Supports PDFs, videos, documents & more for AI fine-tuning and testing.

Size: 393 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

DerwenAI/kleptosyn

Synthetic data generation for investigative graphs based on patterns of bad-actor tradecraft.

Language: Jupyter Notebook - Size: 1.88 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 6 - Forks: 0

microsoft/genalog

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Language: Jupyter Notebook - Size: 14.6 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 329 - Forks: 34

pgmpy/pgmpy

Python Library for Causal and Probabilistic Modeling using Bayesian Networks

Language: Python - Size: 13.1 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2,981 - Forks: 859

sdv-dev/Copulas

A library to model multivariate data using copulas.

Language: Python - Size: 31.7 MB - Last synced at: about 19 hours ago - Pushed at: about 19 hours ago - Stars: 595 - Forks: 116

Renumics/awesome-open-data-centric-ai

Curated list of open source tooling for data-centric AI on unstructured data.

Size: 572 KB - Last synced at: about 2 hours ago - Pushed at: over 1 year ago - Stars: 718 - Forks: 36

synthetichealth/synthea

Synthetic Patient Population Simulator

Language: Java - Size: 742 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 2,581 - Forks: 741

allenmonkey970/ben10-synthetic-battles

This project builds on the Ben 10 Alien Universe Realistic Battle Dataset and adds a synthetic, expanded dataset for testing and analysis.

Language: Jupyter Notebook - Size: 3 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

chaudharijeel9673/linux-syslog-insights

Explore "linux-syslog-insights" to gain valuable insights into Linux server activity through a custom Splunk dashboard. ๐Ÿ“Š Analyze trends in authentication, detect brute-force attempts, and monitor CPU anomalies to enhance your system's security. ๐Ÿ™

Language: Python - Size: 1.01 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

synthesizer-project/synthesizer

Synthesizer - a code for creating synthetic astrophysical observables

Language: Python - Size: 17.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 30 - Forks: 13

nucleuscloud/neosync

Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.

Language: Go - Size: 175 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 3,875 - Forks: 156

modelscope/data-juicer

Data processing for and with foundation models! ๐ŸŽ ๐Ÿ‹ ๐ŸŒฝ โžก๏ธ โžก๏ธ๐Ÿธ ๐Ÿน ๐Ÿท

Language: Python - Size: 223 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 4,607 - Forks: 243

Kiln-AI/Kiln

The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

Language: Python - Size: 19.3 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 3,789 - Forks: 266

synthesized-io/tdk-demo

This is a collection of TDK demo projects that use different databases and options

Language: YAML - Size: 69.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 17 - Forks: 4

eggai-tech/qa-extraction-with-human-review

Question & answer extraction with human review

Language: Jupyter Notebook - Size: 7.98 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

sean-zw/SynthECG

This repository hosts advanced models for generating ECG signals using deep learning techniques. Contributions are welcome, so feel free to fork and submit your improvements! ๐Ÿ™๐Ÿ’ป

Language: Python - Size: 11.7 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

dbt-labs/jaffle-shop-generator

๐Ÿฅช๐Ÿญ A simple CLI for generating synthetic Jaffle Shop data.

Language: Python - Size: 6.5 MB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 40 - Forks: 9

Vini09-cpu/agentin

AI Agents for Technology Services

Size: 1000 Bytes - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

sdv-dev/SDV

Synthetic data generation for tabular data

Language: Python - Size: 31.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3,026 - Forks: 365

Data-Centric-AI-Community/awesome-python-for-data-science

A curated list of awesome resources such as books, tutorials, courses, open-source libraries, exercises, and other materials that support Pythonistas in the making, and Pythonistas migrating into Data Science! ๐Ÿ“Š

Language: Jupyter Notebook - Size: 51.8 MB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 86 - Forks: 19

ahmad-alismail/LLM_based_Synthetic_Data_Generation

A curated and continuously updated collection of papers, tools, and datasets on synthetic data generation using LLMs and agentic workflows.

Size: 28.3 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

sdv-dev/CTGAN

Conditional GAN for generating synthetic tabular data.

Language: Python - Size: 1.83 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,410 - Forks: 314

Deezpa/PyTorch-CreditScoring-ThinFile

A PyTorch-based deep learning extension to my PhD thesis on credit scoring of thin-file consumers.

Size: 7.81 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

vanderschaarlab/DECAF Fork of trentkyono/DECAF

DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Language: Python - Size: 35.2 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 20 - Forks: 10

ImJaeSung/Synthesizers

Implementations of various synthesizers with pytorch.

Language: Python - Size: 14.7 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

naomibaes/Synthetic-LSC_pipeline

Synthetic datasets to evaluate key dimensions of LSC (Sentiment, Intensity, Breadth), generated using LLMs and WordNet from the LSC-Eval framework.

Language: Jupyter Notebook - Size: 31.5 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

Project-AgML/AgML

AgML is a centralized framework for agricultural machine learning. AgML provides access to public agricultural datasets for common agricultural deep learning tasks, with standard benchmarks and pretrained models, as well the ability to generate synthetic data and annotations.

Language: Python - Size: 212 MB - Last synced at: 6 days ago - Pushed at: 2 months ago - Stars: 228 - Forks: 33

gada17/synthetic-to-viewbinding-migrator

Convert Kotlin Android Fragments from synthetic imports to ViewBinding. Automate the migration of old fragment code to modern, type-safe view binding in Android projects.

Language: Python - Size: 10.7 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

privateai/pai-thin-client

A python client used to interact with the Private AI's API

Language: Python - Size: 736 KB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 22 - Forks: 3

argilla-io/distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Language: Python - Size: 554 MB - Last synced at: 4 days ago - Pushed at: 12 days ago - Stars: 2,755 - Forks: 205

Data-Centric-AI-Community/awesome-data-centric-ai

Open-Source Software, Tutorials, and Research on Data-Centric AI ๐Ÿค–

Language: Jupyter Notebook - Size: 6.73 MB - Last synced at: about 20 hours ago - Pushed at: over 1 year ago - Stars: 337 - Forks: 46

DLR-RM/BlenderProc

A procedural Blender pipeline for photorealistic training image generation

Language: Python - Size: 96 MB - Last synced at: 5 days ago - Pushed at: 2 months ago - Stars: 3,110 - Forks: 466

tanaos/synthex-python

Generate high-quality, large-scale synthetic datasets ๐Ÿ“Š๐Ÿงช

Language: Python - Size: 270 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3 - Forks: 1

vanderschaarlab/synthcity

A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.

Language: Python - Size: 6.77 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 555 - Forks: 76

sdv-dev/SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.

Language: Python - Size: 2.75 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 236 - Forks: 49

tdspora/syngen

Open-source version of the TDspora synthetic data generation algorithm.

Language: Jupyter Notebook - Size: 18.2 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 17 - Forks: 9

bespokelabsai/curator

Synthetic data curation for post-training and structured data extraction

Language: Python - Size: 62.6 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,391 - Forks: 109

mostly-ai/mostlyai-engine

Synthetic Data Engine ๐Ÿ’Ž

Language: Python - Size: 2.5 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 62 - Forks: 5

mostly-ai/mostlyai-qa

Synthetic Data Quality Assurance ๐Ÿ”Ž

Language: HTML - Size: 131 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 57 - Forks: 5

gretelai/gretel-python-client

The Gretel Python Client allows you to interact with the Gretel REST API.

Language: Python - Size: 31.1 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 56 - Forks: 19

GreenmaskIO/greenmask

PostgreSQL database anonymization and synthetic data generation tool

Language: Go - Size: 32.3 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1,437 - Forks: 35

sdv-dev/SDGym

Benchmarking synthetic data generation methods.

Language: Python - Size: 3.06 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 274 - Forks: 63

SchweizerischeBundesbahnen/SynPopToolbox

SynPopToolbox is a Python framework designed for analysis, visualization and manipulation of a synthetic population produced by the land-use simulation software FaLC (https://github.com/falc-sim-org/FaLC) and related subproducts. Contact: [email protected]

Language: Python - Size: 70.4 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 6 - Forks: 0

magpie-align/magpie

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!

Language: Python - Size: 1.08 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 712 - Forks: 62

shuttle-hq/synth

The Declarative Data Generator

Language: Rust - Size: 32.3 MB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 1,418 - Forks: 108

openxrlab/xrfeitoria

OpenXRLab Synthetic Data Rendering Toolbox

Language: Python - Size: 1.28 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 281 - Forks: 20

vincentkoc/tiny_qa_benchmark_pp

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

Language: Python - Size: 306 KB - Last synced at: about 17 hours ago - Pushed at: about 1 month ago - Stars: 23 - Forks: 0

KI-AIM/Cinnamon

Cinnamon is a modular application designed to offer robust functionalities for data anonymization, synthetization, and evaluation.

Language: Java - Size: 40.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 21 - Forks: 1

gretelai/awesome-synthetic-data

๐Ÿ“– A curated list of resources dedicated to synthetic data

Size: 40 KB - Last synced at: 10 days ago - Pushed at: almost 3 years ago - Stars: 131 - Forks: 10

SigVarGen/SigVarGen

SigVarGen is a Python framework for time-series signal generation, data augmentation, and anomaly simulation. It creates diverse 1D signal variants under controlled conditions, including idle-state, perturbed, and noisy signals.

Language: Jupyter Notebook - Size: 84.3 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 0

RichardObi/frd-score

Official implementation of the Frรฉchet Radiomics Distance | pip install frd-score

Language: Python - Size: 2.07 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 3 - Forks: 1

OllieBoyne/BlenderSynth

Synthetic Blender Dataset Production

Language: Python - Size: 34.9 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 76 - Forks: 7

harveybc/feature-extractor

Application for training an autoencoder for generating an encoder that can be used as feature extractor for dimensionality and noise reduction, while the decoder can be used for synthetic data generation. Supports dynamic plugin integration, allowing users to extend its capabilities by adding custom encoder and decoder models.

Language: Python - Size: 184 MB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 5 - Forks: 0

yashmaurya01/Awesome-ML-Privacy-Mitigations

A curated collection of privacy-preserving machine learning techniques, tools, and practical evaluations. Focuses on differential privacy, federated learning, secure computation, and synthetic data generation for implementing privacy in ML workflows.

Size: 146 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 0

tanaos/tanaos-docs

Documentation for our synthetic data generation SDKs and APIs ๐Ÿ“–

Language: TypeScript - Size: 2.46 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

pedrodevog/SynthECG

This repository provides the first systematic evaluation framework for synthetic 10-second 12-lead ECGs from diagnostic class-conditioned generative models.

Language: Python - Size: 12.7 KB - Last synced at: 7 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0

davanstrien/awesome-synthetic-datasets

awesome synthetic (text) datasets

Language: Jupyter Notebook - Size: 184 KB - Last synced at: about 17 hours ago - Pushed at: 8 months ago - Stars: 282 - Forks: 11

data-catering/data-caterer Fork of pflooky/data-caterer

Test data management tool for any data source, batch or real-time. Generate, validate and clean up data all in one tool.

Language: Scala - Size: 2.8 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 57 - Forks: 8

KodCode-AI/kodcode

โœจ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork

Language: Python - Size: 40.6 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 229 - Forks: 10

gszfwsb/NCFM

Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in CVPR 2025 (Highlight).

Language: Python - Size: 1.11 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 364 - Forks: 27

aimclub/BAMT

Repository of a data modeling and analysis tool based on Bayesian networks

Language: Python - Size: 106 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 129 - Forks: 20

ndiwawan/qa-generator-with-human-review

# QA Generator with Human ReviewThis repository allows you to generate QA pairs from documents, incorporating a human review process through Label Studio. ๐Ÿ› ๏ธ Track sources, filter quality, and export in multiple formats for effective dataset creation. ๐ŸŒŸ

Language: Python - Size: 46.9 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

ajaykr2712/ML_DS

Dialy Curated Open Source Learnings of ML ๐Ÿค–

Language: Jupyter Notebook - Size: 73.9 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

javi22020/CharacterGen

Tool to generate identity-consistent LoRA training data.

Language: Python - Size: 381 KB - Last synced at: 4 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

Shuyu-XJTU/SVTA

The official repo of "Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark"

Size: 1000 Bytes - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

SCAI-BIO/syndat

Synthetic data quality evaluation & visualization

Language: Python - Size: 188 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 2 - Forks: 0

mirpo/datamatic

Generate synthetic datasets using local LLMs via Ollama and LMstudio with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other major language models.

Language: Go - Size: 79.1 KB - Last synced at: about 17 hours ago - Pushed at: about 18 hours ago - Stars: 1 - Forks: 0

plaitpy/plaitpy

plait.py - a fake data modeler

Language: Python - Size: 1 MB - Last synced at: 16 days ago - Pushed at: over 6 years ago - Stars: 435 - Forks: 22

zjunlp/Knowledge2Data

Spatial Knowledge Graph-Guided Synthesis for Multimodal LLMs

Language: Python - Size: 1.51 MB - Last synced at: 8 days ago - Pushed at: 18 days ago - Stars: 2 - Forks: 0

sergio-sanz-rodriguez/Synthetic-To-Real-Object-Detection-Edition-2

Training object-detection deep learning models using 100% synthetic data.

Language: Python - Size: 22.4 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

sassoftware/dpmm

dpmm: a library for synthetic tabular data generation with rich functionality and end-to-end Differential Privacy guarantees

Language: Python - Size: 661 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 3 - Forks: 0

tabularis-ai/be_great

A novel approach for synthesizing tabular data using pretrained large language models

Language: Python - Size: 4.29 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 312 - Forks: 52

SciPhi-AI/synthesizer ๐Ÿ“ฆ

A multi-purpose LLM framework for RAG and data creation.

Language: Python - Size: 31.5 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 626 - Forks: 53

unrealcv/unrealcv

UnrealCV: Connecting Computer Vision to Unreal Engine

Language: C++ - Size: 18.1 MB - Last synced at: 17 days ago - Pushed at: 2 months ago - Stars: 2,008 - Forks: 444

microsoft/DPSDA

Private Evolution: Generating DP Synthetic Data without Training [ICLR 2024, ICML 2024 Spotlight]

Language: Python - Size: 8.64 MB - Last synced at: 1 day ago - Pushed at: 23 days ago - Stars: 97 - Forks: 13

Goodbyefrog/synthetic-ping-data-generator

Modular Java application to generate synthetic user, device, and event data for data engineering pipelines and software testing.

Language: Java - Size: 29.3 KB - Last synced at: 18 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

databrickslabs/dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

Language: Python - Size: 11.1 MB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 407 - Forks: 74

IDanK0/Deepseek-Dataset-Generator

Deepseek-Dataset-Generator crea dataset conversazionali per il fine-tuning di LLM tramite API DeepSeek. Supporta vari formati (ChatML, ShareGPT, Alpaca, JSON, CSV), configurazione semplice via YAML e log dettagliati. Ideale per generare dati realistici e personalizzati in modo rapido.

Language: Python - Size: 165 KB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

srivathsan96/Splunk-Admin-Monitoring-Dashboard

Splunk project analyzing simulated Apache web logs to detect failing endpoints, access trends, slow APIs, suspicious patterns, and usage by device/browser. Includes complex SPL queries and visual storytelling.

Language: Python - Size: 997 KB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

jknafou/TransCorpus

TransCorpus is a scalable toolkit for large-scale, parallel translation and preprocessing of text corpora, built for language model pretraining and research.

Language: Python - Size: 5.91 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

worldbank/REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

Language: Jupyter Notebook - Size: 12.3 MB - Last synced at: 16 days ago - Pushed at: 21 days ago - Stars: 228 - Forks: 28

ThomasRochefortB/open-agentinstruct

An open-source recreation of the AgentInstruct agentic workflow for synthetic data generation

Language: Python - Size: 372 KB - Last synced at: 3 days ago - Pushed at: 2 months ago - Stars: 16 - Forks: 0

sdv-dev/TGAN

Generative adversarial training for generating synthetic tabular data.

Language: Python - Size: 7.84 MB - Last synced at: 16 days ago - Pushed at: over 2 years ago - Stars: 290 - Forks: 91

starfishdata/starfish

Synthetic data generation to fuel AI models

Language: Python - Size: 14 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 30 - Forks: 1

RiccardoSenica/synthetic-consumer-data

Generate synthetic consumers and their weekly purchase history using AI. Create synthetic data with detailed profiles, shopping habits, and consistent spending patterns.

Language: TypeScript - Size: 350 KB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

tempo-sim/Tempo

The Tempo Unreal Engine plugins

Language: C++ - Size: 6.76 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 15 - Forks: 5

agr78/PRLx-GAN

Generative modeling and latent projection label denoising approach to create synthetic rim lesions on QSM

Language: Shell - Size: 7.12 MB - Last synced at: 3 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

martinkuhn94/PALSYN

PALSYN is a tool that generates privacy-preserving, process-oriented synthetic data using Autoregressive Sequence Models and differential privacy techniques.

Language: Python - Size: 11.8 MB - Last synced at: 15 days ago - Pushed at: 22 days ago - Stars: 1 - Forks: 2

intervene-EU-H2020/synthetic_data

Software program for generating synthetic datasets for genotypes and phenotypes

Language: Jupyter Notebook - Size: 82.2 MB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 15 - Forks: 3

Baukebrenninkmeijer/table-evaluator

Evaluate real and synthetic datasets against each other

Language: Jupyter Notebook - Size: 7.21 MB - Last synced at: 17 days ago - Pushed at: 29 days ago - Stars: 89 - Forks: 28

SherAndrei/blender-gen-dataset

Generate synthetic datasets with Blender

Language: Python - Size: 2.6 MB - Last synced at: 11 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

rapiddweller/datamimic

๐Ÿง  Model-Driven test data generation platform enabling developers to create realistic, scalable, and privacy-compliant test data. Features model-driven data generation, GDPR compliance, and seamless Python integration.

Language: Python - Size: 14.3 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 25 - Forks: 2

roboflow/magic-scissors

Synthetic data for object detection and segmentation

Language: Python - Size: 877 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 3

firmai/mtss-gan ๐Ÿ“ฆ

MTSS-GAN: Multivariate Time Series Simulation with Generative Adversarial Networks (by @firmai)

Size: 3.62 MB - Last synced at: 2 days ago - Pushed at: over 4 years ago - Stars: 94 - Forks: 30

Related Keywords
synthetic-data 706 machine-learning 140 synthetic-dataset-generation 120 deep-learning 111 python 91 computer-vision 62 dataset 45 llm 44 data-generation 38 pytorch 35 gan 35 object-detection 35 generative-adversarial-network 32 ai 31 generative-ai 31 data-science 31 synthetic-data-generation 30 privacy 29 tabular-data 27 nlp 26 data-augmentation 25 generative-model 23 simulation 23 blender 23 time-series 23 data 21 dataset-generation 18 differential-privacy 18 large-language-models 18 llms 16 diffusion-models 16 datasets 16 synthetic 15 gans 15 domain-adaptation 15 tensorflow 14 data-generator 14 anonymization 14 openai 14 artificial-intelligence 14 evaluation 14 fine-tuning 13 reinforcement-learning 11 classification 11 transfer-learning 11 instance-segmentation 10 generator 10 faker 10 synthetic-data-generator 10 segmentation 10 3d 9 clustering 9 semantic-segmentation 9 fake-data 9 pose-estimation 9 image-processing 9 augmentation 9 transformers 9 detection 9 open-source 8 fraud-detection 8 privacy-enhancing-technologies 8 docker 8 ocr 8 test-data-generator 8 data-analysis 8 face-recognition 8 deep-neural-networks 8 benchmark 8 neural-network 7 natural-language-processing 7 ros 7 r 7 huggingface 7 data-visualization 7 awesome-list 7 database 7 gdpr 7 deeplearning 7 medical-imaging 7 robotics 7 rendering 7 generative-models 7 finance 6 explainable-ai 6 metadata 6 opencv 6 keras 6 evaluation-framework 6 synthea 6 ctgan 6 finetuning 6 unity 6 testing 6 instruction-tuning 6 imbalanced-data 6 pandas 6 agent 6 data-quality 6 grade 6