Topic: "synthetic-data"
stefan-jansen/machine-learning-for-trading
Code for Machine Learning for Algorithmic Trading, 2nd edition.
Language: Jupyter Notebook - Size: 652 MB - Last synced at: 17 days ago - Pushed at: 9 months ago - Stars: 14,707 - Forks: 4,559

lk-geimfari/mimesis
Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
Language: Python - Size: 33.8 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 4,545 - Forks: 338

modelscope/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Language: Python - Size: 169 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 4,344 - Forks: 231

nucleuscloud/neosync
Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.
Language: Go - Size: 168 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3,853 - Forks: 155

Kiln-AI/Kiln
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
Language: Python - Size: 14.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3,442 - Forks: 237

DLR-RM/BlenderProc
A procedural Blender pipeline for photorealistic training image generation
Language: Python - Size: 96 MB - Last synced at: 8 days ago - Pushed at: 25 days ago - Stars: 3,043 - Forks: 464

pgmpy/pgmpy
Python Library for Causal and Probabilistic Modeling using Bayesian Networks
Language: Python - Size: 13.1 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,894 - Forks: 753

argilla-io/distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Language: Python - Size: 543 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2,671 - Forks: 198

sdv-dev/SDV
Synthetic data generation for tabular data
Language: Python - Size: 31.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,624 - Forks: 330

synthetichealth/synthea
Synthetic Patient Population Simulator
Language: Java - Size: 741 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,519 - Forks: 729

hitsz-ids/synthetic-data-generator
SDG is a specialized framework designed to generate high-quality structured tabular data.
Language: Python - Size: 4.19 MB - Last synced at: 13 days ago - Pushed at: 2 months ago - Stars: 2,350 - Forks: 379

unrealcv/unrealcv
UnrealCV: Connecting Computer Vision to Unreal Engine
Language: C++ - Size: 18.1 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 1,987 - Forks: 443

ydataai/ydata-synthetic
Synthetic data generators for tabular and time-series data
Language: Jupyter Notebook - Size: 16.4 MB - Last synced at: 13 days ago - Pushed at: 2 months ago - Stars: 1,534 - Forks: 250

shuttle-hq/synth
The Declarative Data Generator
Language: Rust - Size: 32.3 MB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 1,416 - Forks: 108

sdv-dev/CTGAN
Conditional GAN for generating synthetic tabular data.
Language: Python - Size: 1.82 MB - Last synced at: 9 days ago - Pushed at: 13 days ago - Stars: 1,380 - Forks: 308

GreenmaskIO/greenmask
PostgreSQL database anonymization and synthetic data generation tool
Language: Go - Size: 31.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1,297 - Forks: 31

bespokelabsai/curator
Synthetic data curation for post-training and structured data extraction
Language: Python - Size: 62.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,292 - Forks: 100

datadreamer-dev/DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
Language: Python - Size: 895 KB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 1,010 - Forks: 53

plurai-ai/intellagent
A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic interactions
Language: Python - Size: 14.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1,006 - Forks: 129

BatsResearch/bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
Language: Python - Size: 796 KB - Last synced at: 20 days ago - Pushed at: 2 months ago - Stars: 767 - Forks: 49

Renumics/awesome-open-data-centric-ai
Curated list of open source tooling for data-centric AI on unstructured data.
Size: 572 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 719 - Forks: 35

magpie-align/magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
Language: Python - Size: 1.08 MB - Last synced at: about 12 hours ago - Pushed at: about 2 months ago - Stars: 695 - Forks: 61

jofpin/synthBTC
A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.
Language: JavaScript - Size: 6.46 MB - Last synced at: 29 days ago - Pushed at: 9 months ago - Stars: 684 - Forks: 414

nicolas-hbt/pygraft
Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
Language: Python - Size: 699 KB - Last synced at: 21 days ago - Pushed at: 10 months ago - Stars: 682 - Forks: 45

gretelai/gretel-synthetics
Synthetic data generators for structured and unstructured text, featuring differentially private learning.
Language: Python - Size: 2.35 MB - Last synced at: about 6 hours ago - Pushed at: about 2 months ago - Stars: 637 - Forks: 91

SciPhi-AI/synthesizer 📦
A multi-purpose LLM framework for RAG and data creation.
Language: Python - Size: 31.5 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 624 - Forks: 54

paulbricman/thisrepositorydoesnotexist
A curated list of awesome projects which use Machine Learning to generate synthetic content.
Size: 34.2 KB - Last synced at: 4 days ago - Pushed at: about 2 years ago - Stars: 585 - Forks: 40

sdv-dev/Copulas
A library to model multivariate data using copulas.
Language: Python - Size: 27.5 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 584 - Forks: 116

vanderschaarlab/synthcity
A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
Language: Python - Size: 6.76 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 545 - Forks: 74

mostly-ai/mostlyai
Synthetic Data SDK ✨
Language: Python - Size: 13.9 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 472 - Forks: 34

plaitpy/plaitpy
plait.py - a fake data modeler
Language: Python - Size: 1 MB - Last synced at: 7 days ago - Pushed at: over 6 years ago - Stars: 434 - Forks: 22

yandex-research/tab-ddpm
[ICML 2023] The official implementation of the paper "TabDDPM: Modelling Tabular Data with Diffusion Models"
Language: Python - Size: 183 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 426 - Forks: 97

GeorgeCazenavette/mtt-distillation
Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"
Language: Python - Size: 38.6 MB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 420 - Forks: 58

StacklokLabs/promptwright
Generate large synthetic data using an LLM
Language: Python - Size: 13.9 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 412 - Forks: 32

wenbowen123/iros20-6d-pose-tracking
[IROS 2020] se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains
Language: Python - Size: 84.8 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 407 - Forks: 67

sparkfish/augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Language: Python - Size: 245 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 404 - Forks: 48

databrickslabs/dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Language: Python - Size: 11.1 MB - Last synced at: 15 days ago - Pushed at: 2 months ago - Stars: 401 - Forks: 72

Unity-Technologies/SynthDet 📦
SynthDet - An end-to-end object detection pipeline using synthetic data
Language: C# - Size: 2.19 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 373 - Forks: 55

Data-Centric-AI-Community/awesome-data-centric-ai
Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖
Language: Jupyter Notebook - Size: 6.73 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 334 - Forks: 46

gszfwsb/NCFM
Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in CVPR 2025 (Highlight).
Language: Python - Size: 1.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 325 - Forks: 18

microsoft/genalog
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
Language: Jupyter Notebook - Size: 14.6 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 324 - Forks: 34

Nicholasli1995/EvoSkeleton
Official project website for the CVPR 2020 paper (Oral Presentation) "Cascaded deep monocular 3D human pose estimation wth evolutionary training data"
Language: Python - Size: 17.1 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 323 - Forks: 43

BMW-InnovationLab/BMW-Labeltool-Lite
This repository provides you with an easy-to-use labeling tool for State-of-the-art Deep Learning training purposes. It supports Auto-Labeling.
Language: C# - Size: 478 MB - Last synced at: 4 days ago - Pushed at: 10 months ago - Stars: 322 - Forks: 47

Unity-Technologies/Robotics-Object-Pose-Estimation
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.
Language: Python - Size: 38.6 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 315 - Forks: 77

Unity-Technologies/PeopleSansPeople
Unity's privacy-preserving human-centric synthetic data generator
Language: C# - Size: 446 MB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 309 - Forks: 35

ZumoLabs/zpy
Synthetic data for computer vision. An open source toolkit using Blender and Python.
Language: Python - Size: 29.3 MB - Last synced at: about 8 hours ago - Pushed at: over 3 years ago - Stars: 309 - Forks: 34

tirthajyoti/pydbgen
Random dataframe and database table generator
Language: Python - Size: 687 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 309 - Forks: 58

nickkunz/smogn
Synthetic Minority Over-Sampling Technique for Regression
Language: Python - Size: 730 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 308 - Forks: 76

LinkedAi/flip
Synthetic Image generation with Flip. Generate thousands of new 2D images from a small batch of objects and backgrounds.
Language: Python - Size: 80.1 MB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 306 - Forks: 35

milaan9/Clustering-Datasets
This repository contains the collection of UCI (real-life) datasets and Synthetic (artificial) datasets (with cluster labels and MATLAB files) ready to use with clustering algorithms.
Size: 99.2 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 302 - Forks: 223

fjxmlzn/DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
Language: Python - Size: 67.4 KB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 300 - Forks: 75

sdv-dev/TGAN
Generative adversarial training for generating synthetic tabular data.
Language: Python - Size: 7.84 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 288 - Forks: 91

debidatta/syndata-generation
Code used to generate synthetic scenes and bounding box annotations for object detection. This was used to generate data used in the Cut, Paste and Learn paper
Language: Python - Size: 6.44 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 280 - Forks: 72

davanstrien/awesome-synthetic-datasets
awesome synthetic (text) datasets
Language: Jupyter Notebook - Size: 184 KB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 278 - Forks: 11

sdv-dev/SDGym
Benchmarking synthetic data generation methods.
Language: Python - Size: 3.05 MB - Last synced at: 9 days ago - Pushed at: 13 days ago - Stars: 273 - Forks: 63

openxrlab/xrfeitoria
OpenXRLab Synthetic Data Rendering Toolbox
Language: Python - Size: 1.28 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 273 - Forks: 20

kevinlin311tw/CDCL-human-part-segmentation
Repository for Paper: Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation (TCSVT20)
Language: Python - Size: 5.67 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 255 - Forks: 43

expectedparrot/edsl
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
Language: Python - Size: 58.7 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 238 - Forks: 24

sdv-dev/SDMetrics
Metrics to evaluate quality and efficacy of synthetic datasets.
Language: Python - Size: 2.72 MB - Last synced at: 9 days ago - Pushed at: 24 days ago - Stars: 231 - Forks: 48

worldbank/REaLTabFormer
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
Language: Jupyter Notebook - Size: 12.2 MB - Last synced at: 19 days ago - Pushed at: 2 months ago - Stars: 225 - Forks: 26

jrieke/shape-detection
🟣 Object detection of abstract shapes with neural networks
Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 219 - Forks: 129

Project-AgML/AgML
AgML is a centralized framework for agricultural machine learning. AgML provides access to public agricultural datasets for common agricultural deep learning tasks, with standard benchmarks and pretrained models, as well the ability to generate synthetic data and annotations.
Language: Python - Size: 212 MB - Last synced at: 1 day ago - Pushed at: 20 days ago - Stars: 212 - Forks: 32

ndrplz/surround_vehicles_awareness
Learn to map surrounding vehicles onto a bird's eye view of the scene.
Language: Python - Size: 6.12 MB - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 209 - Forks: 71

firmai/datagene
DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)
Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: 5 days ago - Pushed at: over 3 years ago - Stars: 205 - Forks: 24

TonicAI/masquerade
A Postgres Proxy to Mask Data in Realtime
Language: C# - Size: 84 KB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 193 - Forks: 16

KodCode-AI/kodcode
✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork
Language: Python - Size: 40.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 180 - Forks: 10

statice/awesome-synthetic-data
A curated list of awesome synthetic data tools (open source and commercial).
Size: 8.79 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 178 - Forks: 23

ku21fan/STR-Fewer-Labels
Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)
Language: Jupyter Notebook - Size: 1.61 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 166 - Forks: 26

AlexanderVNikitin/tsgm
Generation and evaluation of synthetic time series datasets (also, augmentations, visualizations, a collection of popular datasets) NeurIPS'24
Language: Python - Size: 9.81 MB - Last synced at: 3 days ago - Pushed at: 9 months ago - Stars: 165 - Forks: 17

zjrwtx/SFT-data-builder
利用免费的大模型api来结合你的私域数据来生成sft训练数据(妥妥白嫖)支持llamafactory等工具的训练数据格式synthetic data
Language: JavaScript - Size: 502 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 154 - Forks: 15

RichardObi/medigan
medigan - A Python Library of Pretrained Generative Models for Medical Image Synthesis
Language: Python - Size: 106 MB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 154 - Forks: 19

MhLiao/SynthText3D
Project page of SynthText3D
Language: C++ - Size: 1.44 MB - Last synced at: 8 days ago - Pushed at: over 5 years ago - Stars: 145 - Forks: 23

DataformerAI/dataformer
Solving data for LLMs - Create quality synthetic datasets!
Language: Python - Size: 278 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 143 - Forks: 12

anton-jeran/FAST-RIR
This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Language: Python - Size: 4.47 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 143 - Forks: 26

atapour/monocularDepth-Inference
Inference pipeline for the CVPR paper entitled "Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer" (http://atapour.co.uk/papers/atapour18monocular.pdf).
Language: Python - Size: 6.9 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 141 - Forks: 37

Shuyu-XJTU/APTM
The official code of "Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark"
Language: Python - Size: 2.3 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 130 - Forks: 12

rapiddweller/rapiddweller-benerator-ce
BENERATOR is a leading software solution to generate, obfuscate, pseudonymize and migrate data for development, testing, and training purposes with a model-driven approach.
Language: Java - Size: 35.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 128 - Forks: 24

aimclub/BAMT
Repository of a data modeling and analysis tool based on Bayesian networks
Language: Python - Size: 106 MB - Last synced at: 9 days ago - Pushed at: 23 days ago - Stars: 126 - Forks: 20

fiddlecube/fiddlecube-sdk
Generate ideal question-answers for testing RAG
Language: Python - Size: 8.97 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 126 - Forks: 3

gretelai/awesome-synthetic-data
📖 A curated list of resources dedicated to synthetic data
Size: 40 KB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 126 - Forks: 10

sdv-dev/DeepEcho
Synthetic Data Generation for mixed-type, multivariate time series.
Language: Python - Size: 756 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 112 - Forks: 16

stefan-jansen/synthetic-data-for-finance
Material for QuantUniversity talk on Sythetic Data Generation for Finance.
Language: Jupyter Notebook - Size: 757 KB - Last synced at: 23 days ago - Pushed at: over 4 years ago - Stars: 110 - Forks: 45

khawar-islam/diffuseMix
Official PyTorch implementation of DiffuseMix : Label-Preserving Data Augmentation with Diffusion Models (CVPR'2024)
Language: Python - Size: 1.75 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 107 - Forks: 7

LiheYoung/FreeMask
[NeurIPS 2023] FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models
Language: Python - Size: 13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 107 - Forks: 1

kirill-vish/Beyond-INet
Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"
Language: Python - Size: 130 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 101 - Forks: 6

neurallambda/awesome-reasoning
a curated list of data for reasoning ai
Size: 89.8 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 101 - Forks: 5

microsoft/DPSDA
Private Evolution: Generating DP Synthetic Data without Training [ICLR 2024, ICML 2024 Spotlight]
Language: Python - Size: 8.44 MB - Last synced at: 4 days ago - Pushed at: 3 months ago - Stars: 94 - Forks: 11

gist-ailab/uoais
Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling", ICRA 2022
Language: Python - Size: 15 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 93 - Forks: 20

firmai/mtss-gan 📦
MTSS-GAN: Multivariate Time Series Simulation with Generative Adversarial Networks (by @firmai)
Size: 3.62 MB - Last synced at: 9 days ago - Pushed at: over 4 years ago - Stars: 93 - Forks: 31

barseghyanartur/faker-file
Create files with fake data. In many formats. With no efforts.
Language: Python - Size: 1.61 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 92 - Forks: 6

ruirangerfan/Three-Filters-to-Normal
Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator (RAL+ICRA'21)
Language: C++ - Size: 85.3 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 91 - Forks: 14

Baukebrenninkmeijer/table-evaluator
Evaluate real and synthetic datasets against each other
Language: Jupyter Notebook - Size: 7.07 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 87 - Forks: 28

justchenhao/IAug_CDNet
Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images.
Language: Python - Size: 16.9 MB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 85 - Forks: 19

Data-Centric-AI-Community/awesome-python-for-data-science
A curated list of awesome resources such as books, tutorials, courses, open-source libraries, exercises, and other materials that support Pythonistas in the making, and Pythonistas migrating into Data Science! 📊
Language: Jupyter Notebook - Size: 51.8 MB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 84 - Forks: 19

privateai/deid-examples
Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.
Language: Jupyter Notebook - Size: 37.8 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 81 - Forks: 1

VincentGranville/Main
Main folder. Material related to my books on synthetic data and generative AI. Also contains documents blending components from several folders, or covering topics spanning across multiple folders..
Language: Python - Size: 42.3 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 79 - Forks: 17

eXascaleInfolab/LFR-Benchmark_UndirWeightOvp
Extended version of the Lancichinetti-Fortunato-Radicchi Benchmark for Undirected Weighted Overlapping networks to evaluate clustering algorithms using generated ground-truth communities
Language: C++ - Size: 48.8 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 76 - Forks: 14

BMW-InnovationLab/SORDI-AI-Evaluation-GUI
This repository allows you to evaluate a trained computer vision model and get general information and evaluation metrics with little configuration.
Language: Python - Size: 41.5 MB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 75 - Forks: 3

hassony2/obman_render
[cvpr19] Code to generate images from the ObMan dataset, synthetic renderings of hands holding objects (or hands in isolation)
Language: Python - Size: 5.69 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 75 - Forks: 9

OllieBoyne/BlenderSynth
Synthetic Blender Dataset Production
Language: Python - Size: 34.9 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 74 - Forks: 7
