An open API service providing repository metadata for many open source software ecosystems.

Topic: "synthetic-data"

stefan-jansen/machine-learning-for-trading

Code for Machine Learning for Algorithmic Trading, 2nd edition.

Language: Jupyter Notebook - Size: 652 MB - Last synced at: 17 days ago - Pushed at: 9 months ago - Stars: 14,707 - Forks: 4,559

lk-geimfari/mimesis

Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.

Language: Python - Size: 33.8 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 4,545 - Forks: 338

modelscope/data-juicer

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Language: Python - Size: 169 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 4,344 - Forks: 231

nucleuscloud/neosync

Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.

Language: Go - Size: 168 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3,853 - Forks: 155

Kiln-AI/Kiln

The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

Language: Python - Size: 14.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3,442 - Forks: 237

DLR-RM/BlenderProc

A procedural Blender pipeline for photorealistic training image generation

Language: Python - Size: 96 MB - Last synced at: 8 days ago - Pushed at: 25 days ago - Stars: 3,043 - Forks: 464

pgmpy/pgmpy

Python Library for Causal and Probabilistic Modeling using Bayesian Networks

Language: Python - Size: 13.1 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,894 - Forks: 753

argilla-io/distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Language: Python - Size: 543 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,671 - Forks: 198

sdv-dev/SDV

Synthetic data generation for tabular data

Language: Python - Size: 31.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,624 - Forks: 330

synthetichealth/synthea

Synthetic Patient Population Simulator

Language: Java - Size: 741 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,519 - Forks: 729

hitsz-ids/synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured tabular data.

Language: Python - Size: 4.19 MB - Last synced at: 13 days ago - Pushed at: 2 months ago - Stars: 2,350 - Forks: 379

unrealcv/unrealcv

UnrealCV: Connecting Computer Vision to Unreal Engine

Language: C++ - Size: 18.1 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 1,987 - Forks: 443

ydataai/ydata-synthetic

Synthetic data generators for tabular and time-series data

Language: Jupyter Notebook - Size: 16.4 MB - Last synced at: 13 days ago - Pushed at: 2 months ago - Stars: 1,534 - Forks: 250

shuttle-hq/synth

The Declarative Data Generator

Language: Rust - Size: 32.3 MB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 1,416 - Forks: 108

sdv-dev/CTGAN

Conditional GAN for generating synthetic tabular data.

Language: Python - Size: 1.82 MB - Last synced at: 9 days ago - Pushed at: 13 days ago - Stars: 1,380 - Forks: 308

GreenmaskIO/greenmask

PostgreSQL database anonymization and synthetic data generation tool

Language: Go - Size: 31.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1,297 - Forks: 31

bespokelabsai/curator

Synthetic data curation for post-training and structured data extraction

Language: Python - Size: 62.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,292 - Forks: 100

datadreamer-dev/DataDreamer

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤

Language: Python - Size: 895 KB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 1,010 - Forks: 53

plurai-ai/intellagent

A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic interactions

Language: Python - Size: 14.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1,006 - Forks: 129

BatsResearch/bonito

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

Language: Python - Size: 796 KB - Last synced at: 20 days ago - Pushed at: 2 months ago - Stars: 767 - Forks: 49

Renumics/awesome-open-data-centric-ai

Curated list of open source tooling for data-centric AI on unstructured data.

Size: 572 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 719 - Forks: 35

magpie-align/magpie

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!

Language: Python - Size: 1.08 MB - Last synced at: about 12 hours ago - Pushed at: about 2 months ago - Stars: 695 - Forks: 61

jofpin/synthBTC

A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.

Language: JavaScript - Size: 6.46 MB - Last synced at: 29 days ago - Pushed at: 9 months ago - Stars: 684 - Forks: 414

nicolas-hbt/pygraft

Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips

Language: Python - Size: 699 KB - Last synced at: 21 days ago - Pushed at: 10 months ago - Stars: 682 - Forks: 45

gretelai/gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.

Language: Python - Size: 2.35 MB - Last synced at: about 6 hours ago - Pushed at: about 2 months ago - Stars: 637 - Forks: 91

SciPhi-AI/synthesizer 📦

A multi-purpose LLM framework for RAG and data creation.

Language: Python - Size: 31.5 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 624 - Forks: 54

paulbricman/thisrepositorydoesnotexist

A curated list of awesome projects which use Machine Learning to generate synthetic content.

Size: 34.2 KB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 585 - Forks: 40

sdv-dev/Copulas

A library to model multivariate data using copulas.

Language: Python - Size: 27.5 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 584 - Forks: 116

vanderschaarlab/synthcity

A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.

Language: Python - Size: 6.76 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 545 - Forks: 74

mostly-ai/mostlyai

Synthetic Data SDK ✨

Language: Python - Size: 13.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 472 - Forks: 34

plaitpy/plaitpy

plait.py - a fake data modeler

Language: Python - Size: 1 MB - Last synced at: 7 days ago - Pushed at: over 6 years ago - Stars: 434 - Forks: 22

yandex-research/tab-ddpm

[ICML 2023] The official implementation of the paper "TabDDPM: Modelling Tabular Data with Diffusion Models"

Language: Python - Size: 183 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 426 - Forks: 97

GeorgeCazenavette/mtt-distillation

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

Language: Python - Size: 38.6 MB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 420 - Forks: 58

StacklokLabs/promptwright

Generate large synthetic data using an LLM

Language: Python - Size: 13.9 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 412 - Forks: 32

wenbowen123/iros20-6d-pose-tracking

[IROS 2020] se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains

Language: Python - Size: 84.8 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 407 - Forks: 67

sparkfish/augraphy

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Language: Python - Size: 245 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 404 - Forks: 48

databrickslabs/dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

Language: Python - Size: 11.1 MB - Last synced at: 15 days ago - Pushed at: 2 months ago - Stars: 401 - Forks: 72

Unity-Technologies/SynthDet 📦

SynthDet - An end-to-end object detection pipeline using synthetic data

Language: C# - Size: 2.19 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 373 - Forks: 55

Data-Centric-AI-Community/awesome-data-centric-ai

Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖

Language: Jupyter Notebook - Size: 6.73 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 334 - Forks: 46

gszfwsb/NCFM

Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in CVPR 2025 (Highlight).

Language: Python - Size: 1.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 325 - Forks: 18

microsoft/genalog

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Language: Jupyter Notebook - Size: 14.6 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 324 - Forks: 34

Nicholasli1995/EvoSkeleton

Official project website for the CVPR 2020 paper (Oral Presentation) "Cascaded deep monocular 3D human pose estimation wth evolutionary training data"

Language: Python - Size: 17.1 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 323 - Forks: 43

BMW-InnovationLab/BMW-Labeltool-Lite

This repository provides you with an easy-to-use labeling tool for State-of-the-art Deep Learning training purposes. It supports Auto-Labeling.

Language: C# - Size: 478 MB - Last synced at: 4 days ago - Pushed at: 10 months ago - Stars: 322 - Forks: 47

Unity-Technologies/Robotics-Object-Pose-Estimation

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Language: Python - Size: 38.6 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 315 - Forks: 77

Unity-Technologies/PeopleSansPeople

Unity's privacy-preserving human-centric synthetic data generator

Language: C# - Size: 446 MB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 309 - Forks: 35

ZumoLabs/zpy

Synthetic data for computer vision. An open source toolkit using Blender and Python.

Language: Python - Size: 29.3 MB - Last synced at: about 8 hours ago - Pushed at: over 3 years ago - Stars: 309 - Forks: 34

tirthajyoti/pydbgen

Random dataframe and database table generator

Language: Python - Size: 687 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 309 - Forks: 58

nickkunz/smogn

Synthetic Minority Over-Sampling Technique for Regression

Language: Python - Size: 730 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 308 - Forks: 76

LinkedAi/flip

Synthetic Image generation with Flip. Generate thousands of new 2D images from a small batch of objects and backgrounds.

Language: Python - Size: 80.1 MB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 306 - Forks: 35

milaan9/Clustering-Datasets

This repository contains the collection of UCI (real-life) datasets and Synthetic (artificial) datasets (with cluster labels and MATLAB files) ready to use with clustering algorithms.

Size: 99.2 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 302 - Forks: 223

fjxmlzn/DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

Language: Python - Size: 67.4 KB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 300 - Forks: 75

sdv-dev/TGAN

Generative adversarial training for generating synthetic tabular data.

Language: Python - Size: 7.84 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 288 - Forks: 91

debidatta/syndata-generation

Code used to generate synthetic scenes and bounding box annotations for object detection. This was used to generate data used in the Cut, Paste and Learn paper

Language: Python - Size: 6.44 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 280 - Forks: 72

davanstrien/awesome-synthetic-datasets

awesome synthetic (text) datasets

Language: Jupyter Notebook - Size: 184 KB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 278 - Forks: 11

sdv-dev/SDGym

Benchmarking synthetic data generation methods.

Language: Python - Size: 3.05 MB - Last synced at: 9 days ago - Pushed at: 13 days ago - Stars: 273 - Forks: 63

openxrlab/xrfeitoria

OpenXRLab Synthetic Data Rendering Toolbox

Language: Python - Size: 1.28 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 273 - Forks: 20

kevinlin311tw/CDCL-human-part-segmentation

Repository for Paper: Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation (TCSVT20)

Language: Python - Size: 5.67 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 255 - Forks: 43

expectedparrot/edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

Language: Python - Size: 58.7 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 238 - Forks: 24

sdv-dev/SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.

Language: Python - Size: 2.72 MB - Last synced at: 9 days ago - Pushed at: 24 days ago - Stars: 231 - Forks: 48

worldbank/REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

Language: Jupyter Notebook - Size: 12.2 MB - Last synced at: 19 days ago - Pushed at: 2 months ago - Stars: 225 - Forks: 26

jrieke/shape-detection

🟣 Object detection of abstract shapes with neural networks

Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 219 - Forks: 129

Project-AgML/AgML

AgML is a centralized framework for agricultural machine learning. AgML provides access to public agricultural datasets for common agricultural deep learning tasks, with standard benchmarks and pretrained models, as well the ability to generate synthetic data and annotations.

Language: Python - Size: 212 MB - Last synced at: 1 day ago - Pushed at: 20 days ago - Stars: 212 - Forks: 32

ndrplz/surround_vehicles_awareness

Learn to map surrounding vehicles onto a bird's eye view of the scene.

Language: Python - Size: 6.12 MB - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 209 - Forks: 71

firmai/datagene

DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)

Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: 5 days ago - Pushed at: over 3 years ago - Stars: 205 - Forks: 24

TonicAI/masquerade

A Postgres Proxy to Mask Data in Realtime

Language: C# - Size: 84 KB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 193 - Forks: 16

KodCode-AI/kodcode

✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork

Language: Python - Size: 40.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 180 - Forks: 10

statice/awesome-synthetic-data

A curated list of awesome synthetic data tools (open source and commercial).

Size: 8.79 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 178 - Forks: 23

ku21fan/STR-Fewer-Labels

Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)

Language: Jupyter Notebook - Size: 1.61 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 166 - Forks: 26

AlexanderVNikitin/tsgm

Generation and evaluation of synthetic time series datasets (also, augmentations, visualizations, a collection of popular datasets) NeurIPS'24

Language: Python - Size: 9.81 MB - Last synced at: 3 days ago - Pushed at: 9 months ago - Stars: 165 - Forks: 17

zjrwtx/SFT-data-builder

利用免费的大模型api来结合你的私域数据来生成sft训练数据(妥妥白嫖)支持llamafactory等工具的训练数据格式synthetic data

Language: JavaScript - Size: 502 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 154 - Forks: 15

RichardObi/medigan

medigan - A Python Library of Pretrained Generative Models for Medical Image Synthesis

Language: Python - Size: 106 MB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 154 - Forks: 19

MhLiao/SynthText3D

Project page of SynthText3D

Language: C++ - Size: 1.44 MB - Last synced at: 8 days ago - Pushed at: over 5 years ago - Stars: 145 - Forks: 23

DataformerAI/dataformer

Solving data for LLMs - Create quality synthetic datasets!

Language: Python - Size: 278 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 143 - Forks: 12

anton-jeran/FAST-RIR

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Language: Python - Size: 4.47 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 143 - Forks: 26

atapour/monocularDepth-Inference

Inference pipeline for the CVPR paper entitled "Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer" (http://atapour.co.uk/papers/atapour18monocular.pdf).

Language: Python - Size: 6.9 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 141 - Forks: 37

Shuyu-XJTU/APTM

The official code of "Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark"

Language: Python - Size: 2.3 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 130 - Forks: 12

rapiddweller/rapiddweller-benerator-ce

BENERATOR is a leading software solution to generate, obfuscate, pseudonymize and migrate data for development, testing, and training purposes with a model-driven approach.

Language: Java - Size: 35.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 128 - Forks: 24

aimclub/BAMT

Repository of a data modeling and analysis tool based on Bayesian networks

Language: Python - Size: 106 MB - Last synced at: 9 days ago - Pushed at: 23 days ago - Stars: 126 - Forks: 20

fiddlecube/fiddlecube-sdk

Generate ideal question-answers for testing RAG

Language: Python - Size: 8.97 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 126 - Forks: 3

gretelai/awesome-synthetic-data

📖 A curated list of resources dedicated to synthetic data

Size: 40 KB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 126 - Forks: 10

sdv-dev/DeepEcho

Synthetic Data Generation for mixed-type, multivariate time series.

Language: Python - Size: 756 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 112 - Forks: 16

stefan-jansen/synthetic-data-for-finance

Material for QuantUniversity talk on Sythetic Data Generation for Finance.

Language: Jupyter Notebook - Size: 757 KB - Last synced at: 23 days ago - Pushed at: over 4 years ago - Stars: 110 - Forks: 45

khawar-islam/diffuseMix

Official PyTorch implementation of DiffuseMix : Label-Preserving Data Augmentation with Diffusion Models (CVPR'2024)

Language: Python - Size: 1.75 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 107 - Forks: 7

LiheYoung/FreeMask

[NeurIPS 2023] FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models

Language: Python - Size: 13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 107 - Forks: 1

kirill-vish/Beyond-INet

Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"

Language: Python - Size: 130 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 101 - Forks: 6

neurallambda/awesome-reasoning

a curated list of data for reasoning ai

Size: 89.8 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 101 - Forks: 5

microsoft/DPSDA

Private Evolution: Generating DP Synthetic Data without Training [ICLR 2024, ICML 2024 Spotlight]

Language: Python - Size: 8.44 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 94 - Forks: 11

gist-ailab/uoais

Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling", ICRA 2022

Language: Python - Size: 15 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 93 - Forks: 20

firmai/mtss-gan 📦

MTSS-GAN: Multivariate Time Series Simulation with Generative Adversarial Networks (by @firmai)

Size: 3.62 MB - Last synced at: 9 days ago - Pushed at: over 4 years ago - Stars: 93 - Forks: 31

barseghyanartur/faker-file

Create files with fake data. In many formats. With no efforts.

Language: Python - Size: 1.61 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 92 - Forks: 6

ruirangerfan/Three-Filters-to-Normal

Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator (RAL+ICRA'21)

Language: C++ - Size: 85.3 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 91 - Forks: 14

Baukebrenninkmeijer/table-evaluator

Evaluate real and synthetic datasets against each other

Language: Jupyter Notebook - Size: 7.07 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 87 - Forks: 28

justchenhao/IAug_CDNet

Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images.

Language: Python - Size: 16.9 MB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 85 - Forks: 19

Data-Centric-AI-Community/awesome-python-for-data-science

A curated list of awesome resources such as books, tutorials, courses, open-source libraries, exercises, and other materials that support Pythonistas in the making, and Pythonistas migrating into Data Science! 📊

Language: Jupyter Notebook - Size: 51.8 MB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 84 - Forks: 19

privateai/deid-examples

Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.

Language: Jupyter Notebook - Size: 37.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 81 - Forks: 1

VincentGranville/Main

Main folder. Material related to my books on synthetic data and generative AI. Also contains documents blending components from several folders, or covering topics spanning across multiple folders..

Language: Python - Size: 42.3 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 79 - Forks: 17

eXascaleInfolab/LFR-Benchmark_UndirWeightOvp

Extended version of the Lancichinetti-Fortunato-Radicchi Benchmark for Undirected Weighted Overlapping networks to evaluate clustering algorithms using generated ground-truth communities

Language: C++ - Size: 48.8 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 76 - Forks: 14

BMW-InnovationLab/SORDI-AI-Evaluation-GUI

This repository allows you to evaluate a trained computer vision model and get general information and evaluation metrics with little configuration.

Language: Python - Size: 41.5 MB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 75 - Forks: 3

hassony2/obman_render

[cvpr19] Code to generate images from the ObMan dataset, synthetic renderings of hands holding objects (or hands in isolation)

Language: Python - Size: 5.69 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 75 - Forks: 9

OllieBoyne/BlenderSynth

Synthetic Blender Dataset Production

Language: Python - Size: 34.9 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 74 - Forks: 7

Related Topics
machine-learning 128 synthetic-dataset-generation 111 deep-learning 104 python 85 computer-vision 60 dataset 42 llm 41 object-detection 34 gan 33 data-generation 33 generative-adversarial-network 31 data-science 30 ai 29 pytorch 29 generative-ai 29 synthetic-data-generation 28 privacy 28 tabular-data 25 blender 23 simulation 22 data-augmentation 22 generative-model 21 time-series 21 nlp 21 data 19 dataset-generation 17 large-language-models 17 datasets 16 gans 16 differential-privacy 16 domain-adaptation 15 synthetic 15 anonymization 14 llms 14 artificial-intelligence 14 data-generator 14 fine-tuning 13 diffusion-models 13 tensorflow 13 openai 12 evaluation 12 transfer-learning 11 reinforcement-learning 11 classification 11 synthetic-data-generator 10 generator 10 segmentation 10 faker 10 instance-segmentation 9 detection 9 image-processing 9 clustering 9 3d 9 pose-estimation 9 docker 8 privacy-enhancing-technologies 8 semantic-segmentation 8 augmentation 8 fraud-detection 8 face-recognition 8 fake-data 8 deep-neural-networks 8 transformers 8 ocr 8 rendering 7 r 7 huggingface 7 generative-models 7 data-visualization 7 neural-network 7 data-analysis 7 ros 7 medical-imaging 7 robotics 7 gdpr 7 open-source 7 database 7 test-data-generator 7 ctgan 6 grade 6 keras 6 awesome-list 6 natural-language-processing 6 sdv 6 agent 6 metadata 6 numpy 6 pandas 6 opencv 6 testing 6 finetuning 6 explainable-ai 6 deeplearning 6 privacy-tools 6 unsupervised-learning 6 finance 6 benchmark 6 synthea 6 unity 6 isaac-sim 5