An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: synthetic-data

ritesh-modi/fine-tuning-embeddings-template

This repo is a template to fine-tune embedding models using sentencetransformers based on different on configuration

Language: Python - Size: 118 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Kiln-AI/Kiln

The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

Language: Python - Size: 14.3 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 3,391 - Forks: 235

synthesizer-project/synthesizer

Synthesizer - a code for creating synthetic astrophysical observables

Language: Python - Size: 14.7 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 25 - Forks: 11

datadreamer-dev/DataDreamer

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. โ€€ ๐Ÿค–๐Ÿ’ค

Language: Python - Size: 895 KB - Last synced at: about 12 hours ago - Pushed at: 3 months ago - Stars: 1,010 - Forks: 53

shuttle-hq/synth

The Declarative Data Generator

Language: Rust - Size: 32.3 MB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 1,414 - Forks: 109

argilla-io/distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Language: Python - Size: 543 MB - Last synced at: 2 days ago - Pushed at: 6 days ago - Stars: 2,640 - Forks: 193

here4learning/synthetic-to-viewbinding-migrator

Convert Kotlin Android Fragments from synthetic imports to ViewBinding. Automate the migration of old fragment code to modern, type-safe view binding in Android projects.

Language: Python - Size: 9.77 KB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

PrinceV-hub/GAN-Generation-of-Synthetic-Data-

Generate and evaluate synthetic tabular data using GANs with visual comparisons.

Language: Python - Size: 1.1 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

sdv-dev/DeepEcho

Synthetic Data Generation for mixed-type, multivariate time series.

Language: Python - Size: 755 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 111 - Forks: 15

modelscope/data-juicer

Data processing for and with foundation models! ๐ŸŽ ๐Ÿ‹ ๐ŸŒฝ โžก๏ธ โžก๏ธ๐Ÿธ ๐Ÿน ๐Ÿท

Language: Python - Size: 169 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 4,222 - Forks: 227

lk-geimfari/mimesis

Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.

Language: Python - Size: 23.2 MB - Last synced at: 1 day ago - Pushed at: 26 days ago - Stars: 4,534 - Forks: 338

synthetichealth/synthea

Synthetic Patient Population Simulator

Language: Java - Size: 738 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 2,466 - Forks: 721

mostly-ai/mostlyai-qa

Synthetic Data Quality Assurance ๐Ÿ”Ž

Language: HTML - Size: 128 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 31 - Forks: 5

mostly-ai/mostlyai-engine

Synthetic Data Engine ๐Ÿ’Ž

Language: Python - Size: 2.01 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 53 - Forks: 2

pr0mila/MediBeng-Whisper-Tiny

MediBeng Whisper Tiny improves doctor-patient transcription by training the Whisper Tiny model to translate mixed Bengali-English speech into English, making it easier for analysis, record-keeping, and using AI in healthcare.

Language: Python - Size: 692 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 0

starfishdata/starfish

Synthetic data generation to fuel AI models

Language: Python - Size: 683 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 0

GBR-RL/SmartSort-CAM

End-to-end industrial part classification system using Blender-generated synthetic data, ConvNeXt, Grad-CAM, and FastAPI โ€” fully containerized with Docker for real-time inference.

Language: Jupyter Notebook - Size: 39.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

sdv-dev/SDV

Synthetic data generation for tabular data

Language: Python - Size: 31 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,597 - Forks: 331

tanaos/tanaos-docs

Documentation for our synthetic data generation SDKs and APIs ๐Ÿ“–

Language: CSS - Size: 1.69 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

sdv-dev/CTGAN

Conditional GAN for generating synthetic tabular data.

Language: Python - Size: 1.82 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,372 - Forks: 307

nicolas-hbt/pygraft

Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips

Language: Python - Size: 699 KB - Last synced at: about 17 hours ago - Pushed at: 9 months ago - Stars: 682 - Forks: 45

Clearbox-AI/clearbox-synthetic-kit

Clearbox AI's all-in-one solution for generation and evaluation of synthetic tabular and time-series data.

Language: Jupyter Notebook - Size: 4.68 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 42 - Forks: 1

gretelai/gretel-python-client

The Gretel Python Client allows you to interact with the Gretel REST API.

Language: Python - Size: 31 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 54 - Forks: 19

diffix/syndiffix

Python implementation of the SynDiffix synthetic data generation mechanism.

Language: Python - Size: 610 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8 - Forks: 2

TyMill/SynthPred

A Julia package for synthetic data analysis, advanced imputation (ARIMA, RNN), AutoML, and ensemble modeling.

Language: Julia - Size: 308 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 1

synthesized-io/tdk-demo

This is a collection of TDK demo projects that use different databases and options

Language: YAML - Size: 69.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 17 - Forks: 4

microsoft/DPSDA

Private Evolution: Generating DP Synthetic Data without Training [ICLR 2024, ICML 2024 Spotlight]

Language: Python - Size: 8.44 MB - Last synced at: about 13 hours ago - Pushed at: about 2 months ago - Stars: 95 - Forks: 11

microsoft/genalog

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Language: Jupyter Notebook - Size: 14.6 MB - Last synced at: about 13 hours ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 32

Project-AgML/AgML

AgML is a centralized framework for agricultural machine learning. AgML provides access to public agricultural datasets for common agricultural deep learning tasks, with standard benchmarks and pretrained models, as well the ability to generate synthetic data and annotations.

Language: Python - Size: 213 MB - Last synced at: 3 days ago - Pushed at: 7 days ago - Stars: 209 - Forks: 32

MichelBMachado/PIVML

PIVML is a repository containing machine learning models to predict the velocity field of sequential PIV images through optical flow estimation.

Language: Jupyter Notebook - Size: 722 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Vini09-cpu/agentin

AI Agents for Technology Services

Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

stacklok/promptwright

Generate large synthetic data using an LLM

Language: Python - Size: 14 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 406 - Forks: 33

DLR-RM/BlenderProc

A procedural Blender pipeline for photorealistic training image generation

Language: Python - Size: 96 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 3,019 - Forks: 462

Ronit26Mehta/Reddit-Sentiment-Analysis-and-ETL-Pipeline

we have created a project which would focus on using of synthetic data of reddit and then transform it using spark and hive and then store it in s3 . after this a sentimental analysis would be perform on the data

Language: Python - Size: 2.34 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

firmai/datagene

DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)

Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 205 - Forks: 24

expectedparrot/edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

Language: Python - Size: 58.2 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 233 - Forks: 25

aliciusschroeder/spintax-editor

A modern, visual editor for spintax (spinning syntax) with tree-based editing, live preview, and YAML export. Ideal light-weight AI training data generation alternative. Built with Next.js, TailwindCSS, and TypeScript.

Language: TypeScript - Size: 337 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

martinjurkovic/syntherela

A package for benchmarking synthetic relational data generation methods

Language: Python - Size: 964 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 35 - Forks: 1

bespokelabsai/curator

Synthetic data curation for post-training and structured data extraction

Language: Python - Size: 64.6 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,198 - Forks: 90

zealscott/SynMeter

A principled library for tuning, training and evaluating tabular data synthesis on fidelity, privacy and utility.

Language: Python - Size: 2.82 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 19 - Forks: 1

worldbank/REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

Language: Jupyter Notebook - Size: 12.2 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 225 - Forks: 26

Francis-Calingo/Canadian-Rental-Prices-and-Immigration-ML-Predictive-Model

Language: Jupyter Notebook - Size: 8.24 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

ThomasRochefortB/open-agentinstruct

An open-source recreation of the AgentInstruct agentic workflow for synthetic data generation

Language: Python - Size: 246 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 14 - Forks: 0

usnistgov/SDNist

SDNist: Benchmark data and evaluation tools for data synthesizers.

Language: Python - Size: 42.2 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 35 - Forks: 15

sdv-dev/SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.

Language: Python - Size: 2.69 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 229 - Forks: 47

aimclub/BAMT

Repository of a data modeling and analysis tool based on Bayesian networks

Language: Python - Size: 106 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 126 - Forks: 19

nucleuscloud/neosync

Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.

Language: Go - Size: 164 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 3,841 - Forks: 152

1kastner/conflowgen

A generator for synthetic container flows at maritime container terminals with a focus on yard operations

Language: Python - Size: 2.47 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 13 - Forks: 7

mostly-ai/mostlyai

Synthetic Data SDK โœจ

Language: Python - Size: 13.6 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 397 - Forks: 32

KI-AIM/Cinnamon

Cinnamon is a modular application designed to offer robust functionalities for data anonymization, synthetization, and evaluation.

Language: Java - Size: 40.5 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 21 - Forks: 1

magpie-align/magpie

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!

Language: Python - Size: 1.08 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 673 - Forks: 60

statice/awesome-synthetic-data

A curated list of awesome synthetic data tools (open source and commercial).

Size: 8.79 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 171 - Forks: 23

DerwenAI/kleptosyn

Synthetic data generation for investigative graphs based on patterns of bad-actor tradecraft.

Language: Jupyter Notebook - Size: 1.88 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 5 - Forks: 0

GreenmaskIO/greenmask

PostgreSQL database anonymization and synthetic data generation tool

Language: Go - Size: 31.7 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1,297 - Forks: 31

JonnoB/scrambledtext_analysis

Can synthetic corrupted data be used to train LLM's to correct OCR text?

Language: Python - Size: 203 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

vanderschaarlab/synthcity

A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.

Language: Python - Size: 6.76 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 532 - Forks: 71

Data-Centric-AI-Community/awesome-data-centric-ai

Open-Source Software, Tutorials, and Research on Data-Centric AI ๐Ÿค–

Language: Jupyter Notebook - Size: 6.73 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 332 - Forks: 46

gretelai/gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.

Language: Python - Size: 2.35 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 629 - Forks: 90

sdv-dev/SDGym

Benchmarking synthetic data generation methods.

Language: Python - Size: 3.05 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 272 - Forks: 63

sdv-dev/Copulas

A library to model multivariate data using copulas.

Language: Python - Size: 27.5 MB - Last synced at: 8 days ago - Pushed at: 17 days ago - Stars: 585 - Forks: 116

unrealcv/unrealcv

UnrealCV: Connecting Computer Vision to Unreal Engine

Language: C++ - Size: 18.2 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 1,983 - Forks: 442

jofpin/synthBTC

A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.

Language: JavaScript - Size: 6.46 MB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 684 - Forks: 414

stefan-jansen/synthetic-data-for-finance

Material for QuantUniversity talk on Sythetic Data Generation for Finance.

Language: Jupyter Notebook - Size: 757 KB - Last synced at: 2 days ago - Pushed at: over 4 years ago - Stars: 110 - Forks: 45

gszfwsb/NCFM

Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in CVPR 2025 (Highlight).

Language: Python - Size: 1.17 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 325 - Forks: 18

sparkfish/augraphy

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Language: Python - Size: 245 MB - Last synced at: 8 days ago - Pushed at: 20 days ago - Stars: 404 - Forks: 48

SciPhi-AI/synthesizer ๐Ÿ“ฆ

A multi-purpose LLM framework for RAG and data creation.

Language: Python - Size: 31.5 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 621 - Forks: 54

zahramh99/Synthetic-Data-Generation-with-Generative-AI

Language: Python - Size: 0 Bytes - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

Infinitode/PWLDS

A public dataset of over 10 million passwords, with assigned strength levels.

Size: 124 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 1

privateai/deid-examples

Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.

Language: Jupyter Notebook - Size: 37.8 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 80 - Forks: 1

evertorres/maternal-health-synthetic-data-omop

Generaciรณn de datos sintรฉticos para salud materna en Colombia

Language: Jupyter Notebook - Size: 18.3 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

oeg-upm/TINTO

TINTO: Software to convert Tidy Data into Image for Classification with 2-Dimensional Convolutional Neural Networks

Language: Python - Size: 112 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 3

stefan-jansen/machine-learning-for-trading

Code for Machine Learning for Algorithmic Trading, 2nd edition.

Language: Jupyter Notebook - Size: 652 MB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 14,599 - Forks: 4,530

ydataai/ydata-synthetic

Synthetic data generators for tabular and time-series data

Language: Jupyter Notebook - Size: 16.4 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 1,524 - Forks: 249

tanaos/synthex-python

A Python library for high-quality, large-scale synthetic dataset generation ๐Ÿ“Š๐Ÿงช

Language: Python - Size: 75.2 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

Unity-Technologies/SynthDet ๐Ÿ“ฆ

SynthDet - An end-to-end object detection pipeline using synthetic data

Language: C# - Size: 2.19 MB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 373 - Forks: 55

sodascience/metasyn

Transparent and privacy-friendly synthetic data generation

Language: Python - Size: 7.65 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 41 - Forks: 9

GDelCorso/NA_DAtabase

NA_DA is an open-source software written in Python that generates datasets of regular two-dimensional geometric shapes based on probabilistic distributions. NA_DA comes with an intuitive GUI (Graphical User Interface) that allows users to define shapes, colors, and distributions of features.

Language: Python - Size: 38.6 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 0

dmey/synthia

๐Ÿ“ˆ ๐Ÿ Multidimensional synthetic data generation with Copula and fPCA models in Python

Language: Python - Size: 19.7 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 61 - Forks: 9

davanstrien/awesome-synthetic-datasets

awesome synthetic (text) datasets

Language: Jupyter Notebook - Size: 184 KB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 267 - Forks: 11

Renumics/awesome-open-data-centric-ai

Curated list of open source tooling for data-centric AI on unstructured data.

Size: 572 KB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 716 - Forks: 35

HowieHwong/UniGen

[ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models

Language: Python - Size: 14.5 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 46 - Forks: 1

Unity-Technologies/Robotics-Object-Pose-Estimation

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Language: Python - Size: 38.6 MB - Last synced at: 15 days ago - Pushed at: about 3 years ago - Stars: 315 - Forks: 77

ajaykr2712/ML_DS

Dialy Curated Open Source Learnings of ML ๐Ÿค–

Language: Jupyter Notebook - Size: 35.4 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

Shekswess/synthgenai

SynthGenAI - Package for Generating Synthetic Datasets using LLMs.

Language: Python - Size: 1.64 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 31 - Forks: 3

hitsz-ids/synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured tabular data.

Language: Python - Size: 4.19 MB - Last synced at: 12 days ago - Pushed at: about 2 months ago - Stars: 2,341 - Forks: 379

plurai-ai/intellagent

A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic interactions

Language: Python - Size: 14.3 MB - Last synced at: 16 days ago - Pushed at: 17 days ago - Stars: 1,006 - Forks: 129

tdspora/syngen

Open-source version of the TDspora synthetic data generation algorithm.

Language: Jupyter Notebook - Size: 18.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 17 - Forks: 8

dbt-labs/jaffle-shop-generator

๐Ÿฅช๐Ÿญ A simple CLI for generating synthetic Jaffle Shop data.

Language: Python - Size: 6.5 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 33 - Forks: 8

maxvandenhoven/blenderline

A Blender pipeline for generating synthetic images of production lines

Language: Python - Size: 195 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 1

OmarSamirz/ImageFromTextGenerator

IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.

Language: Python - Size: 15.2 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 14 - Forks: 1

sdv-dev/TGAN

Generative adversarial training for generating synthetic tabular data.

Language: Python - Size: 7.84 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 286 - Forks: 91

firstbatchxyz/dria-sdk

Dria SDK is for building and executing synthetic data generation pipelines on Dria Knowledge Network.

Language: Python - Size: 2.62 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 22 - Forks: 5

Travvy88/DocumentGenerator_DoGe

Synthetic Document Generator for Document AI. Creates document images annotated with text and bounding boxes of each word. Images contain headings, tables, paragraphs with different formatting and fonts. Can be used in OCR, document transformers pretraining, text detection and more other tasks.

Language: Python - Size: 22.3 MB - Last synced at: 18 days ago - Pushed at: 19 days ago - Stars: 19 - Forks: 0

KodCode-AI/kodcode

โœจ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork

Language: Python - Size: 40.6 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 180 - Forks: 10

AlexanderVNikitin/tsgm

Generation and evaluation of synthetic time series datasets (also, augmentations, visualizations, a collection of popular datasets) NeurIPS'24

Language: Python - Size: 9.81 MB - Last synced at: 15 days ago - Pushed at: 8 months ago - Stars: 159 - Forks: 17

BatsResearch/bonito

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

Language: Python - Size: 796 KB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 760 - Forks: 49

Sreyan88/Synthio

Code for ICLR 2025 Paper: Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

Language: Python - Size: 2.29 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 2 - Forks: 0

aim-rsf/cprd-data-wrangle

Introduction to CPRD using synthetic datasets

Language: Jupyter Notebook - Size: 25.7 MB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

camalab-ai/sofa-flow

This is the official implementation of "Streamed optical flow adaptation from synthetic to real dental scenes"

Language: Python - Size: 6.91 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 1

ethicalabs-ai/ouroboros

Self-Improving LLMs Through Iterative Refinement

Language: Python - Size: 429 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 3 - Forks: 0

Related Keywords
synthetic-data 647 machine-learning 126 synthetic-dataset-generation 107 deep-learning 103 python 80 computer-vision 59 dataset 40 llm 38 object-detection 33 gan 33 data-generation 31 generative-adversarial-network 31 data-science 30 pytorch 28 ai 28 generative-ai 27 synthetic-data-generation 27 privacy 27 tabular-data 23 blender 22 data-augmentation 21 simulation 20 time-series 20 generative-model 20 nlp 19 dataset-generation 17 data 17 datasets 16 gans 16 large-language-models 16 differential-privacy 15 domain-adaptation 15 synthetic 15 artificial-intelligence 14 data-generator 14 anonymization 14 diffusion-models 13 tensorflow 13 evaluation 12 llms 12 classification 11 openai 11 fine-tuning 11 transfer-learning 11 reinforcement-learning 10 segmentation 10 generator 10 faker 10 synthetic-data-generator 10 3d 9 instance-segmentation 9 clustering 9 detection 9 pose-estimation 9 image-processing 9 docker 8 neural-network 8 face-recognition 8 transformers 8 fake-data 8 fraud-detection 8 deep-neural-networks 8 semantic-segmentation 8 privacy-enhancing-technologies 8 medical-imaging 7 r 7 database 7 augmentation 7 gdpr 7 ocr 7 generative-models 7 huggingface 7 open-source 7 test-data-generator 7 rendering 7 unity 6 robotics 6 deeplearning 6 grade 6 synthea 6 testing 6 explainable-ai 6 metadata 6 ros 6 finance 6 benchmark 6 natural-language-processing 6 opencv 6 finetuning 6 agent 6 awesome-list 6 keras 6 sdv 6 ctgan 6 unsupervised-learning 6 data-visualization 6 data-analysis 6 convolutional-neural-networks 5 datageneration 5 ml 5