GitHub topics: synthetic-data
ritesh-modi/fine-tuning-embeddings-template
This repo is a template to fine-tune embedding models using sentencetransformers based on different on configuration
Language: Python - Size: 118 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Kiln-AI/Kiln
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
Language: Python - Size: 14.3 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 3,391 - Forks: 235

synthesizer-project/synthesizer
Synthesizer - a code for creating synthetic astrophysical observables
Language: Python - Size: 14.7 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 25 - Forks: 11

datadreamer-dev/DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. โ ๐ค๐ค
Language: Python - Size: 895 KB - Last synced at: about 12 hours ago - Pushed at: 3 months ago - Stars: 1,010 - Forks: 53

shuttle-hq/synth
The Declarative Data Generator
Language: Rust - Size: 32.3 MB - Last synced at: 1 day ago - Pushed at: 7 months ago - Stars: 1,414 - Forks: 109

argilla-io/distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Language: Python - Size: 543 MB - Last synced at: 2 days ago - Pushed at: 6 days ago - Stars: 2,640 - Forks: 193

here4learning/synthetic-to-viewbinding-migrator
Convert Kotlin Android Fragments from synthetic imports to ViewBinding. Automate the migration of old fragment code to modern, type-safe view binding in Android projects.
Language: Python - Size: 9.77 KB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

PrinceV-hub/GAN-Generation-of-Synthetic-Data-
Generate and evaluate synthetic tabular data using GANs with visual comparisons.
Language: Python - Size: 1.1 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

sdv-dev/DeepEcho
Synthetic Data Generation for mixed-type, multivariate time series.
Language: Python - Size: 755 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 111 - Forks: 15

modelscope/data-juicer
Data processing for and with foundation models! ๐ ๐ ๐ฝ โก๏ธ โก๏ธ๐ธ ๐น ๐ท
Language: Python - Size: 169 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 4,222 - Forks: 227

lk-geimfari/mimesis
Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
Language: Python - Size: 23.2 MB - Last synced at: 1 day ago - Pushed at: 26 days ago - Stars: 4,534 - Forks: 338

synthetichealth/synthea
Synthetic Patient Population Simulator
Language: Java - Size: 738 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 2,466 - Forks: 721

mostly-ai/mostlyai-qa
Synthetic Data Quality Assurance ๐
Language: HTML - Size: 128 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 31 - Forks: 5

mostly-ai/mostlyai-engine
Synthetic Data Engine ๐
Language: Python - Size: 2.01 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 53 - Forks: 2

pr0mila/MediBeng-Whisper-Tiny
MediBeng Whisper Tiny improves doctor-patient transcription by training the Whisper Tiny model to translate mixed Bengali-English speech into English, making it easier for analysis, record-keeping, and using AI in healthcare.
Language: Python - Size: 692 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 0

starfishdata/starfish
Synthetic data generation to fuel AI models
Language: Python - Size: 683 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 0

GBR-RL/SmartSort-CAM
End-to-end industrial part classification system using Blender-generated synthetic data, ConvNeXt, Grad-CAM, and FastAPI โ fully containerized with Docker for real-time inference.
Language: Jupyter Notebook - Size: 39.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

sdv-dev/SDV
Synthetic data generation for tabular data
Language: Python - Size: 31 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,597 - Forks: 331

tanaos/tanaos-docs
Documentation for our synthetic data generation SDKs and APIs ๐
Language: CSS - Size: 1.69 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

sdv-dev/CTGAN
Conditional GAN for generating synthetic tabular data.
Language: Python - Size: 1.82 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,372 - Forks: 307

nicolas-hbt/pygraft
Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
Language: Python - Size: 699 KB - Last synced at: about 17 hours ago - Pushed at: 9 months ago - Stars: 682 - Forks: 45

Clearbox-AI/clearbox-synthetic-kit
Clearbox AI's all-in-one solution for generation and evaluation of synthetic tabular and time-series data.
Language: Jupyter Notebook - Size: 4.68 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 42 - Forks: 1

gretelai/gretel-python-client
The Gretel Python Client allows you to interact with the Gretel REST API.
Language: Python - Size: 31 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 54 - Forks: 19

diffix/syndiffix
Python implementation of the SynDiffix synthetic data generation mechanism.
Language: Python - Size: 610 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 8 - Forks: 2

TyMill/SynthPred
A Julia package for synthetic data analysis, advanced imputation (ARIMA, RNN), AutoML, and ensemble modeling.
Language: Julia - Size: 308 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 1

synthesized-io/tdk-demo
This is a collection of TDK demo projects that use different databases and options
Language: YAML - Size: 69.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 17 - Forks: 4

microsoft/DPSDA
Private Evolution: Generating DP Synthetic Data without Training [ICLR 2024, ICML 2024 Spotlight]
Language: Python - Size: 8.44 MB - Last synced at: about 13 hours ago - Pushed at: about 2 months ago - Stars: 95 - Forks: 11

microsoft/genalog
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
Language: Jupyter Notebook - Size: 14.6 MB - Last synced at: about 13 hours ago - Pushed at: over 1 year ago - Stars: 322 - Forks: 32

Project-AgML/AgML
AgML is a centralized framework for agricultural machine learning. AgML provides access to public agricultural datasets for common agricultural deep learning tasks, with standard benchmarks and pretrained models, as well the ability to generate synthetic data and annotations.
Language: Python - Size: 213 MB - Last synced at: 3 days ago - Pushed at: 7 days ago - Stars: 209 - Forks: 32

MichelBMachado/PIVML
PIVML is a repository containing machine learning models to predict the velocity field of sequential PIV images through optical flow estimation.
Language: Jupyter Notebook - Size: 722 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

Vini09-cpu/agentin
AI Agents for Technology Services
Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

stacklok/promptwright
Generate large synthetic data using an LLM
Language: Python - Size: 14 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 406 - Forks: 33

DLR-RM/BlenderProc
A procedural Blender pipeline for photorealistic training image generation
Language: Python - Size: 96 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 3,019 - Forks: 462

Ronit26Mehta/Reddit-Sentiment-Analysis-and-ETL-Pipeline
we have created a project which would focus on using of synthetic data of reddit and then transform it using spark and hive and then store it in s3 . after this a sentimental analysis would be perform on the data
Language: Python - Size: 2.34 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

firmai/datagene
DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)
Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 205 - Forks: 24

expectedparrot/edsl
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
Language: Python - Size: 58.2 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 233 - Forks: 25

aliciusschroeder/spintax-editor
A modern, visual editor for spintax (spinning syntax) with tree-based editing, live preview, and YAML export. Ideal light-weight AI training data generation alternative. Built with Next.js, TailwindCSS, and TypeScript.
Language: TypeScript - Size: 337 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

martinjurkovic/syntherela
A package for benchmarking synthetic relational data generation methods
Language: Python - Size: 964 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 35 - Forks: 1

bespokelabsai/curator
Synthetic data curation for post-training and structured data extraction
Language: Python - Size: 64.6 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,198 - Forks: 90

zealscott/SynMeter
A principled library for tuning, training and evaluating tabular data synthesis on fidelity, privacy and utility.
Language: Python - Size: 2.82 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 19 - Forks: 1

worldbank/REaLTabFormer
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
Language: Jupyter Notebook - Size: 12.2 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 225 - Forks: 26

Francis-Calingo/Canadian-Rental-Prices-and-Immigration-ML-Predictive-Model
Language: Jupyter Notebook - Size: 8.24 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

ThomasRochefortB/open-agentinstruct
An open-source recreation of the AgentInstruct agentic workflow for synthetic data generation
Language: Python - Size: 246 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 14 - Forks: 0

usnistgov/SDNist
SDNist: Benchmark data and evaluation tools for data synthesizers.
Language: Python - Size: 42.2 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 35 - Forks: 15

sdv-dev/SDMetrics
Metrics to evaluate quality and efficacy of synthetic datasets.
Language: Python - Size: 2.69 MB - Last synced at: 7 days ago - Pushed at: 9 days ago - Stars: 229 - Forks: 47

aimclub/BAMT
Repository of a data modeling and analysis tool based on Bayesian networks
Language: Python - Size: 106 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 126 - Forks: 19

nucleuscloud/neosync
Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.
Language: Go - Size: 164 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 3,841 - Forks: 152

1kastner/conflowgen
A generator for synthetic container flows at maritime container terminals with a focus on yard operations
Language: Python - Size: 2.47 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 13 - Forks: 7

mostly-ai/mostlyai
Synthetic Data SDK โจ
Language: Python - Size: 13.6 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 397 - Forks: 32

KI-AIM/Cinnamon
Cinnamon is a modular application designed to offer robust functionalities for data anonymization, synthetization, and evaluation.
Language: Java - Size: 40.5 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 21 - Forks: 1

magpie-align/magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
Language: Python - Size: 1.08 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 673 - Forks: 60

statice/awesome-synthetic-data
A curated list of awesome synthetic data tools (open source and commercial).
Size: 8.79 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 171 - Forks: 23

DerwenAI/kleptosyn
Synthetic data generation for investigative graphs based on patterns of bad-actor tradecraft.
Language: Jupyter Notebook - Size: 1.88 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 5 - Forks: 0

GreenmaskIO/greenmask
PostgreSQL database anonymization and synthetic data generation tool
Language: Go - Size: 31.7 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1,297 - Forks: 31

JonnoB/scrambledtext_analysis
Can synthetic corrupted data be used to train LLM's to correct OCR text?
Language: Python - Size: 203 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

vanderschaarlab/synthcity
A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
Language: Python - Size: 6.76 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 532 - Forks: 71

Data-Centric-AI-Community/awesome-data-centric-ai
Open-Source Software, Tutorials, and Research on Data-Centric AI ๐ค
Language: Jupyter Notebook - Size: 6.73 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 332 - Forks: 46

gretelai/gretel-synthetics
Synthetic data generators for structured and unstructured text, featuring differentially private learning.
Language: Python - Size: 2.35 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 629 - Forks: 90

sdv-dev/SDGym
Benchmarking synthetic data generation methods.
Language: Python - Size: 3.05 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 272 - Forks: 63

sdv-dev/Copulas
A library to model multivariate data using copulas.
Language: Python - Size: 27.5 MB - Last synced at: 8 days ago - Pushed at: 17 days ago - Stars: 585 - Forks: 116

unrealcv/unrealcv
UnrealCV: Connecting Computer Vision to Unreal Engine
Language: C++ - Size: 18.2 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 1,983 - Forks: 442

jofpin/synthBTC
A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.
Language: JavaScript - Size: 6.46 MB - Last synced at: 8 days ago - Pushed at: 9 months ago - Stars: 684 - Forks: 414

stefan-jansen/synthetic-data-for-finance
Material for QuantUniversity talk on Sythetic Data Generation for Finance.
Language: Jupyter Notebook - Size: 757 KB - Last synced at: 2 days ago - Pushed at: over 4 years ago - Stars: 110 - Forks: 45

gszfwsb/NCFM
Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in CVPR 2025 (Highlight).
Language: Python - Size: 1.17 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 325 - Forks: 18

sparkfish/augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Language: Python - Size: 245 MB - Last synced at: 8 days ago - Pushed at: 20 days ago - Stars: 404 - Forks: 48

SciPhi-AI/synthesizer ๐ฆ
A multi-purpose LLM framework for RAG and data creation.
Language: Python - Size: 31.5 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 621 - Forks: 54

zahramh99/Synthetic-Data-Generation-with-Generative-AI
Language: Python - Size: 0 Bytes - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

Infinitode/PWLDS
A public dataset of over 10 million passwords, with assigned strength levels.
Size: 124 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 1

privateai/deid-examples
Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.
Language: Jupyter Notebook - Size: 37.8 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 80 - Forks: 1

evertorres/maternal-health-synthetic-data-omop
Generaciรณn de datos sintรฉticos para salud materna en Colombia
Language: Jupyter Notebook - Size: 18.3 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

oeg-upm/TINTO
TINTO: Software to convert Tidy Data into Image for Classification with 2-Dimensional Convolutional Neural Networks
Language: Python - Size: 112 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 3

stefan-jansen/machine-learning-for-trading
Code for Machine Learning for Algorithmic Trading, 2nd edition.
Language: Jupyter Notebook - Size: 652 MB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 14,599 - Forks: 4,530

ydataai/ydata-synthetic
Synthetic data generators for tabular and time-series data
Language: Jupyter Notebook - Size: 16.4 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 1,524 - Forks: 249

tanaos/synthex-python
A Python library for high-quality, large-scale synthetic dataset generation ๐๐งช
Language: Python - Size: 75.2 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

Unity-Technologies/SynthDet ๐ฆ
SynthDet - An end-to-end object detection pipeline using synthetic data
Language: C# - Size: 2.19 MB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 373 - Forks: 55

sodascience/metasyn
Transparent and privacy-friendly synthetic data generation
Language: Python - Size: 7.65 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 41 - Forks: 9

GDelCorso/NA_DAtabase
NA_DA is an open-source software written in Python that generates datasets of regular two-dimensional geometric shapes based on probabilistic distributions. NA_DA comes with an intuitive GUI (Graphical User Interface) that allows users to define shapes, colors, and distributions of features.
Language: Python - Size: 38.6 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 0

dmey/synthia
๐ ๐ Multidimensional synthetic data generation with Copula and fPCA models in Python
Language: Python - Size: 19.7 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 61 - Forks: 9

davanstrien/awesome-synthetic-datasets
awesome synthetic (text) datasets
Language: Jupyter Notebook - Size: 184 KB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 267 - Forks: 11

Renumics/awesome-open-data-centric-ai
Curated list of open source tooling for data-centric AI on unstructured data.
Size: 572 KB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 716 - Forks: 35

HowieHwong/UniGen
[ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models
Language: Python - Size: 14.5 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 46 - Forks: 1

Unity-Technologies/Robotics-Object-Pose-Estimation
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.
Language: Python - Size: 38.6 MB - Last synced at: 15 days ago - Pushed at: about 3 years ago - Stars: 315 - Forks: 77

ajaykr2712/ML_DS
Dialy Curated Open Source Learnings of ML ๐ค
Language: Jupyter Notebook - Size: 35.4 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

Shekswess/synthgenai
SynthGenAI - Package for Generating Synthetic Datasets using LLMs.
Language: Python - Size: 1.64 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 31 - Forks: 3

hitsz-ids/synthetic-data-generator
SDG is a specialized framework designed to generate high-quality structured tabular data.
Language: Python - Size: 4.19 MB - Last synced at: 12 days ago - Pushed at: about 2 months ago - Stars: 2,341 - Forks: 379

plurai-ai/intellagent
A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic interactions
Language: Python - Size: 14.3 MB - Last synced at: 16 days ago - Pushed at: 17 days ago - Stars: 1,006 - Forks: 129

tdspora/syngen
Open-source version of the TDspora synthetic data generation algorithm.
Language: Jupyter Notebook - Size: 18.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 17 - Forks: 8

dbt-labs/jaffle-shop-generator
๐ฅช๐ญ A simple CLI for generating synthetic Jaffle Shop data.
Language: Python - Size: 6.5 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 33 - Forks: 8

maxvandenhoven/blenderline
A Blender pipeline for generating synthetic images of production lines
Language: Python - Size: 195 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 1

OmarSamirz/ImageFromTextGenerator
IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.
Language: Python - Size: 15.2 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 14 - Forks: 1

sdv-dev/TGAN
Generative adversarial training for generating synthetic tabular data.
Language: Python - Size: 7.84 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 286 - Forks: 91

firstbatchxyz/dria-sdk
Dria SDK is for building and executing synthetic data generation pipelines on Dria Knowledge Network.
Language: Python - Size: 2.62 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 22 - Forks: 5

Travvy88/DocumentGenerator_DoGe
Synthetic Document Generator for Document AI. Creates document images annotated with text and bounding boxes of each word. Images contain headings, tables, paragraphs with different formatting and fonts. Can be used in OCR, document transformers pretraining, text detection and more other tasks.
Language: Python - Size: 22.3 MB - Last synced at: 18 days ago - Pushed at: 19 days ago - Stars: 19 - Forks: 0

KodCode-AI/kodcode
โจ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork
Language: Python - Size: 40.6 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 180 - Forks: 10

AlexanderVNikitin/tsgm
Generation and evaluation of synthetic time series datasets (also, augmentations, visualizations, a collection of popular datasets) NeurIPS'24
Language: Python - Size: 9.81 MB - Last synced at: 15 days ago - Pushed at: 8 months ago - Stars: 159 - Forks: 17

BatsResearch/bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
Language: Python - Size: 796 KB - Last synced at: 20 days ago - Pushed at: about 2 months ago - Stars: 760 - Forks: 49

Sreyan88/Synthio
Code for ICLR 2025 Paper: Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Language: Python - Size: 2.29 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 2 - Forks: 0

aim-rsf/cprd-data-wrangle
Introduction to CPRD using synthetic datasets
Language: Jupyter Notebook - Size: 25.7 MB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

camalab-ai/sofa-flow
This is the official implementation of "Streamed optical flow adaptation from synthetic to real dental scenes"
Language: Python - Size: 6.91 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 1

ethicalabs-ai/ouroboros
Self-Improving LLMs Through Iterative Refinement
Language: Python - Size: 429 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 3 - Forks: 0
