An open API service providing repository metadata for many open source software ecosystems.

Topic: "synthetic-data"

stefan-jansen/machine-learning-for-trading

Code for Machine Learning for Algorithmic Trading, 2nd edition.

Language: Jupyter Notebook - Size: 652 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 14,872 - Forks: 4,601

modelscope/data-juicer

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Language: Python - Size: 453 MB - Last synced at: about 9 hours ago - Pushed at: 3 days ago - Stars: 5,142 - Forks: 267

lk-geimfari/mimesis

Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.

Language: Python - Size: 33.8 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 4,612 - Forks: 342

Kiln-AI/Kiln

The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

Language: Python - Size: 23 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 4,096 - Forks: 296

nucleuscloud/neosync

Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.

Language: Go - Size: 184 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 3,910 - Forks: 167

DLR-RM/BlenderProc

A procedural Blender pipeline for photorealistic training image generation

Language: Python - Size: 91.6 MB - Last synced at: 4 days ago - Pushed at: 26 days ago - Stars: 3,200 - Forks: 482

sdv-dev/SDV

Synthetic data generation for tabular data

Language: Python - Size: 31.8 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 3,150 - Forks: 380

pgmpy/pgmpy

Python library for causal inference. Supports causal discovery, identification, effect estimation, prediction, and simulation with a scikit-learn style API.

Language: Python - Size: 13.4 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 3,032 - Forks: 867

argilla-io/distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Language: Python - Size: 554 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,862 - Forks: 216

synthetichealth/synthea

Synthetic Patient Population Simulator

Language: Java - Size: 742 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 2,702 - Forks: 766

hitsz-ids/synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured tabular data.

Language: Python - Size: 4.19 MB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 2,375 - Forks: 384

unrealcv/unrealcv

UnrealCV: Connecting Computer Vision to Unreal Engine

Language: C++ - Size: 18.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,048 - Forks: 451

ydataai/ydata-synthetic

Synthetic data generators for tabular and time-series data

Language: Jupyter Notebook - Size: 16.3 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,570 - Forks: 252

GreenmaskIO/greenmask

PostgreSQL database anonymization and synthetic data generation tool

Language: Go - Size: 32.7 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1,521 - Forks: 42

bespokelabsai/curator

Synthetic data curation for post-training and structured data extraction

Language: Python - Size: 62.6 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 1,487 - Forks: 120

shuttle-hq/synth

The Declarative Data Generator

Language: Rust - Size: 32.3 MB - Last synced at: 11 days ago - Pushed at: 12 months ago - Stars: 1,443 - Forks: 109

sdv-dev/CTGAN

Conditional GAN for generating synthetic tabular data.

Language: Python - Size: 1.84 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 1,442 - Forks: 320

plurai-ai/intellagent

A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic interactions

Language: Python - Size: 14.2 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1,071 - Forks: 133

datadreamer-dev/DataDreamer

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤

Language: Python - Size: 895 KB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 1,052 - Forks: 53

huggingface/aisheets

Build, enrich, and transform datasets using AI models with no code

Language: TypeScript - Size: 1.7 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 968 - Forks: 93

BatsResearch/bonito

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

Language: Python - Size: 796 KB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 787 - Forks: 49

magpie-align/magpie

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!

Language: Python - Size: 1.08 MB - Last synced at: about 9 hours ago - Pushed at: 6 months ago - Stars: 765 - Forks: 69

Renumics/awesome-open-data-centric-ai

Curated list of open source tooling for data-centric AI on unstructured data.

Size: 572 KB - Last synced at: 1 day ago - Pushed at: almost 2 years ago - Stars: 724 - Forks: 36

nicolas-hbt/pygraft

Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips

Language: Python - Size: 699 KB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 688 - Forks: 45

jofpin/synthBTC

A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.

Language: JavaScript - Size: 6.46 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 669 - Forks: 403

gretelai/gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.

Language: Python - Size: 2.35 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 642 - Forks: 91

mostly-ai/mostlyai

Synthetic Data SDK ✨

Language: Python - Size: 14.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 633 - Forks: 55

SciPhi-AI/synthesizer 📦

A multi-purpose LLM framework for RAG and data creation.

Language: Python - Size: 31.5 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 628 - Forks: 53

sdv-dev/Copulas

A library to model multivariate data using copulas.

Language: Python - Size: 30.5 MB - Last synced at: 5 days ago - Pushed at: 12 days ago - Stars: 608 - Forks: 117

paulbricman/thisrepositorydoesnotexist

A curated list of awesome projects which use Machine Learning to generate synthetic content.

Size: 34.2 KB - Last synced at: 6 days ago - Pushed at: over 2 years ago - Stars: 584 - Forks: 39

vanderschaarlab/synthcity

A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.

Language: Python - Size: 6.8 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 571 - Forks: 76

sparkfish/augraphy

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Language: Python - Size: 254 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 455 - Forks: 56

lukehinds/promptwright

Generate large synthetic data

Language: Python - Size: 13.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 440 - Forks: 32

plaitpy/plaitpy

plait.py - a fake data modeler

Language: Python - Size: 1 MB - Last synced at: 19 days ago - Pushed at: over 6 years ago - Stars: 436 - Forks: 22

yandex-research/tab-ddpm

[ICML 2023] The official implementation of the paper "TabDDPM: Modelling Tabular Data with Diffusion Models"

Language: Python - Size: 183 KB - Last synced at: 6 months ago - Pushed at: about 1 year ago - Stars: 426 - Forks: 97

databrickslabs/dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

Language: Python - Size: 11.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 425 - Forks: 79

GeorgeCazenavette/mtt-distillation

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

Language: Python - Size: 38.6 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 420 - Forks: 58

wenbowen123/iros20-6d-pose-tracking

[IROS 2020] se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains

Language: Python - Size: 84.8 MB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 407 - Forks: 67

Unity-Technologies/SynthDet 📦

SynthDet - An end-to-end object detection pipeline using synthetic data

Language: C# - Size: 2.19 MB - Last synced at: 25 days ago - Pushed at: 9 months ago - Stars: 385 - Forks: 56

gszfwsb/NCFM

Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in CVPR 2025 (Highlight).

Language: Python - Size: 7.68 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 379 - Forks: 29

Data-Centric-AI-Community/awesome-data-centric-ai

Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖

Language: Jupyter Notebook - Size: 6.73 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 338 - Forks: 47

microsoft/genalog

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Language: Jupyter Notebook - Size: 14.6 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 332 - Forks: 32

BMW-InnovationLab/BMW-Labeltool-Lite

This repository provides you with an easy-to-use labeling tool for State-of-the-art Deep Learning training purposes. It supports Auto-Labeling.

Language: C# - Size: 478 MB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 323 - Forks: 47

Nicholasli1995/EvoSkeleton

Official project website for the CVPR 2020 paper (Oral Presentation) "Cascaded deep monocular 3D human pose estimation wth evolutionary training data"

Language: Python - Size: 17.1 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 323 - Forks: 43

tabularis-ai/be_great

A novel approach for synthesizing tabular data using pretrained large language models

Language: Python - Size: 4.29 MB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 321 - Forks: 52

Unity-Technologies/Robotics-Object-Pose-Estimation

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Language: Python - Size: 38.6 MB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 321 - Forks: 78

Unity-Technologies/PeopleSansPeople

Unity's privacy-preserving human-centric synthetic data generator

Language: C# - Size: 446 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 314 - Forks: 35

ZumoLabs/zpy

Synthetic data for computer vision. An open source toolkit using Blender and Python.

Language: Python - Size: 29.3 MB - Last synced at: 10 days ago - Pushed at: almost 4 years ago - Stars: 313 - Forks: 34

milaan9/Clustering-Datasets

This repository contains the collection of UCI (real-life) datasets and Synthetic (artificial) datasets (with cluster labels and MATLAB files) ready to use with clustering algorithms.

Size: 99.2 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 312 - Forks: 236

tirthajyoti/pydbgen

Random dataframe and database table generator

Language: Python - Size: 687 KB - Last synced at: 3 months ago - Pushed at: about 4 years ago - Stars: 309 - Forks: 58

nickkunz/smogn

Synthetic Minority Over-Sampling Technique for Regression

Language: Python - Size: 730 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 308 - Forks: 76

fjxmlzn/DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

Language: Python - Size: 67.4 KB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 307 - Forks: 74

LinkedAi/flip

Synthetic Image generation with Flip. Generate thousands of new 2D images from a small batch of objects and backgrounds.

Language: Python - Size: 80.1 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 307 - Forks: 35

davanstrien/awesome-synthetic-datasets

awesome synthetic (text) datasets

Language: Jupyter Notebook - Size: 188 KB - Last synced at: 12 days ago - Pushed at: 2 months ago - Stars: 295 - Forks: 12

sdv-dev/TGAN

Generative adversarial training for generating synthetic tabular data.

Language: Python - Size: 7.84 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 289 - Forks: 91

openxrlab/xrfeitoria

OpenXRLab Synthetic Data Rendering Toolbox

Language: Python - Size: 1.29 MB - Last synced at: 3 days ago - Pushed at: 7 days ago - Stars: 287 - Forks: 21

debidatta/syndata-generation

Code used to generate synthetic scenes and bounding box annotations for object detection. This was used to generate data used in the Cut, Paste and Learn paper

Language: Python - Size: 6.44 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 280 - Forks: 72

sdv-dev/SDGym

Benchmarking synthetic data generation methods.

Language: Python - Size: 3.15 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 277 - Forks: 63

expectedparrot/edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

Language: Python - Size: 128 MB - Last synced at: about 9 hours ago - Pushed at: about 11 hours ago - Stars: 271 - Forks: 26

kevinlin311tw/CDCL-human-part-segmentation

Repository for Paper: Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation (TCSVT20)

Language: Python - Size: 5.67 MB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 255 - Forks: 43

sdv-dev/SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.

Language: Python - Size: 3.06 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 243 - Forks: 50

KodCode-AI/kodcode

✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork

Language: Python - Size: 40.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 243 - Forks: 13

Project-AgML/AgML

AgML is a centralized framework for agricultural machine learning. AgML provides access to public agricultural datasets for common agricultural deep learning tasks, with standard benchmarks and pretrained models, as well the ability to generate synthetic data and annotations.

Language: Python - Size: 212 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 235 - Forks: 34

worldbank/REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: 22 days ago - Pushed at: about 2 months ago - Stars: 234 - Forks: 29

jrieke/shape-detection

🟣 Object detection of abstract shapes with neural networks

Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: 7 days ago - Pushed at: almost 5 years ago - Stars: 221 - Forks: 126

ndrplz/surround_vehicles_awareness

Learn to map surrounding vehicles onto a bird's eye view of the scene.

Language: Python - Size: 6.12 MB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 210 - Forks: 71

firmai/datagene

DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)

Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: about 22 hours ago - Pushed at: over 3 years ago - Stars: 205 - Forks: 24

statice/awesome-synthetic-data

A curated list of awesome synthetic data tools (open source and commercial).

Size: 8.79 KB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 199 - Forks: 28

TonicAI/masquerade

A Postgres Proxy to Mask Data in Realtime

Language: C# - Size: 84 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 197 - Forks: 15

RichardObi/medigan

medigan - A Python Library of Pretrained Generative Models for Medical Image Synthesis

Language: Python - Size: 106 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 180 - Forks: 21

rungalileo/agent-leaderboard

Ranking LLMs on agentic tasks

Language: Jupyter Notebook - Size: 16.6 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 178 - Forks: 18

AlexanderVNikitin/tsgm

Generation and evaluation of synthetic time series datasets (also, augmentations, visualizations, a collection of popular datasets) NeurIPS'24

Language: Python - Size: 8.63 MB - Last synced at: 14 days ago - Pushed at: about 2 months ago - Stars: 177 - Forks: 19

ku21fan/STR-Fewer-Labels

Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)

Language: Jupyter Notebook - Size: 1.61 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 166 - Forks: 26

zjrwtx/SFT-data-builder

利用免费的大模型api来结合你的私域数据来生成sft训练数据(妥妥白嫖)支持llamafactory等工具的训练数据格式synthetic data

Language: JavaScript - Size: 502 KB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 161 - Forks: 17

Shuyu-XJTU/APTM

The official code of "Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark"

Language: Python - Size: 2.32 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 160 - Forks: 14

rapiddweller/rapiddweller-benerator-ce

BENERATOR is a leading software solution to generate, obfuscate, pseudonymize and migrate data for development, testing, and training purposes with a model-driven approach.

Language: Java - Size: 35.3 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 150 - Forks: 26

MhLiao/SynthText3D

Project page of SynthText3D

Language: C++ - Size: 1.44 MB - Last synced at: 4 months ago - Pushed at: over 5 years ago - Stars: 145 - Forks: 23

DataformerAI/dataformer

Solving data for LLMs - Create quality synthetic datasets!

Language: Python - Size: 278 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 143 - Forks: 12

anton-jeran/FAST-RIR

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Language: Python - Size: 4.47 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 143 - Forks: 26

gist-ailab/uoais

Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling", ICRA 2022

Language: Python - Size: 14.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 142 - Forks: 28

atapour/monocularDepth-Inference

Inference pipeline for the CVPR paper entitled "Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer" (http://atapour.co.uk/papers/atapour18monocular.pdf).

Language: Python - Size: 6.9 MB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 141 - Forks: 37

gretelai/awesome-synthetic-data

📖 A curated list of resources dedicated to synthetic data

Size: 40 KB - Last synced at: 11 days ago - Pushed at: about 3 years ago - Stars: 133 - Forks: 10

aimclub/BAMT

Repository of a data modeling and analysis tool based on Bayesian networks

Language: Python - Size: 106 MB - Last synced at: 10 days ago - Pushed at: 4 months ago - Stars: 132 - Forks: 21

allenai/pixmo-docs

ACL 2025: Synthetic data generation pipelines for text-rich images.

Language: Python - Size: 6.43 MB - Last synced at: 26 days ago - Pushed at: 6 months ago - Stars: 132 - Forks: 20

khawar-islam/diffuseMix

Official PyTorch implementation of DiffuseMix : Label-Preserving Data Augmentation with Diffusion Models (CVPR'2024)

Language: Python - Size: 1.74 MB - Last synced at: 24 days ago - Pushed at: 6 months ago - Stars: 121 - Forks: 8

sdv-dev/DeepEcho

Synthetic Data Generation for mixed-type, multivariate time series.

Language: Python - Size: 767 KB - Last synced at: 20 days ago - Pushed at: 28 days ago - Stars: 116 - Forks: 16

stefan-jansen/synthetic-data-for-finance

Material for QuantUniversity talk on Sythetic Data Generation for Finance.

Language: Jupyter Notebook - Size: 757 KB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 110 - Forks: 45

LiheYoung/FreeMask

[NeurIPS 2023] FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models

Language: Python - Size: 13 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 107 - Forks: 1

kirill-vish/Beyond-INet

Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"

Language: Python - Size: 130 MB - Last synced at: 5 months ago - Pushed at: 12 months ago - Stars: 101 - Forks: 6

neurallambda/awesome-reasoning

a curated list of data for reasoning ai

Size: 89.8 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 101 - Forks: 5

microsoft/DPSDA

Private Evolution: Generating DP Synthetic Data without Training [ICLR 2024, ICML 2024 Spotlight]

Language: Python - Size: 9.54 MB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 100 - Forks: 14

barseghyanartur/faker-file

Create files with fake data. In many formats. With no efforts.

Language: Python - Size: 2.57 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 94 - Forks: 6

firmai/mtss-gan 📦

MTSS-GAN: Multivariate Time Series Simulation with Generative Adversarial Networks (by @firmai)

Size: 3.62 MB - Last synced at: about 22 hours ago - Pushed at: almost 5 years ago - Stars: 94 - Forks: 30

Baukebrenninkmeijer/table-evaluator

Evaluate real and synthetic datasets against each other

Language: Python - Size: 9.51 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 92 - Forks: 28

sunchang0124/dp_cgans

A library to generate synthetic tabular or RDF data using Conditional Generative Adversary Networks (GANs) combined with Differential Privacy techniques.

Language: Python - Size: 266 KB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 92 - Forks: 27

ruirangerfan/Three-Filters-to-Normal

Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator (RAL+ICRA'21)

Language: C++ - Size: 85.3 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 91 - Forks: 14

Data-Centric-AI-Community/awesome-python-for-data-science

A curated list of awesome resources such as books, tutorials, courses, open-source libraries, exercises, and other materials that support Pythonistas in the making, and Pythonistas migrating into Data Science! 📊

Language: Jupyter Notebook - Size: 51.8 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 89 - Forks: 19

justchenhao/IAug_CDNet

Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images.

Language: Python - Size: 16.9 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 88 - Forks: 19

statice/anonymeter

A Unified Framework for Quantifying Privacy Risk in Synthetic Data according to the GDPR

Language: Jupyter Notebook - Size: 1.77 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 87 - Forks: 22

privateai/deid-examples

Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.

Language: Jupyter Notebook - Size: 37.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 81 - Forks: 1

Related Topics
machine-learning 154 synthetic-dataset-generation 131 deep-learning 118 python 108 computer-vision 66 llm 48 dataset 46 data-generation 43 data-science 41 pytorch 39 gan 38 ai 38 generative-ai 37 object-detection 36 synthetic-data-generation 34 generative-adversarial-network 32 time-series 28 privacy 28 tabular-data 28 nlp 27 data 27 data-augmentation 26 simulation 25 blender 24 generative-model 23 dataset-generation 18 differential-privacy 18 large-language-models 18 llms 18 domain-adaptation 17 data-generator 16 synthetic 16 gans 16 datasets 16 diffusion-models 16 evaluation 15 tensorflow 15 artificial-intelligence 14 fine-tuning 14 anonymization 14 openai 13 data-analysis 13 reinforcement-learning 12 generator 11 synthetic-data-generator 11 transfer-learning 11 data-visualization 11 classification 11 healthcare 11 open-source 10 segmentation 10 instance-segmentation 10 faker 10 transformers 10 clustering 10 test-data-generator 9 augmentation 9 fake-data 9 natural-language-processing 9 detection 9 semantic-segmentation 9 fraud-detection 9 3d 9 benchmark 9 huggingface 9 pose-estimation 9 medical-imaging 9 docker 9 image-processing 9 privacy-enhancing-technologies 8 gdpr 8 generative-models 8 ocr 8 r 8 deep-neural-networks 8 finetuning 8 neural-network 8 deeplearning 8 face-recognition 8 unity 7 database 7 finance 7 ctgan 7 robotics 7 rendering 7 keras 7 awesome-list 7 jupyter-notebook 7 ollama 7 image-generation 7 research 7 ros 7 streamlit 7 testing 7 unsupervised-learning 6 opencv 6 sdv 6 data-quality 6 statistical-analysis 6 trading 6