Topic: "data-synthesis"
AgaMiko/data-augmentation-review
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
Size: 3.59 MB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 1,632 - Forks: 207

Tebmer/Awesome-Knowledge-Distillation-of-LLMs
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
Size: 18.6 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 979 - Forks: 56

swz30/CycleISP
[CVPR 2020--Oral] CycleISP: Real Image Restoration via Improved Data Synthesis
Language: Python - Size: 12.1 MB - Last synced at: 14 days ago - Pushed at: 7 months ago - Stars: 528 - Forks: 75

DIYer22/bpycv
Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
Language: Python - Size: 356 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 483 - Forks: 60

Flame-Code-VLM/Flame-Code-VLM
Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured training workflows to bridge the gap between design and front-end development.
Language: Python - Size: 7.24 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 470 - Forks: 28

MrGiovanni/SyntheticTumors
[CVPR 2023] Label-Free Liver Tumor Segmentation
Language: Python - Size: 75.2 MB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 345 - Forks: 27

OS-Copilot/OS-Genesis
Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Language: Jupyter Notebook - Size: 4.68 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 117 - Forks: 8

Xiaohao-Xu/SLAM-under-Perturbation
[ICLR 2025] Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video
Language: C++ - Size: 405 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 53 - Forks: 2

hewei2001/ReachQA
Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"
Language: Python - Size: 9.82 MB - Last synced at: 17 days ago - Pushed at: 6 months ago - Stars: 51 - Forks: 0

Baukebrenninkmeijer/On-the-Generation-and-Evaluation-of-Synthetic-Tabular-Data-using-GANs
Repository for the results of my master thesis, about the generation and evaluation of synthetic data using GANs
Language: Jupyter Notebook - Size: 48.1 MB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 42 - Forks: 4

cxcscmu/Montessori-Instruct
Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning [ICLR 2025]
Language: Python - Size: 27.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 40 - Forks: 3

Eleanor-H/MUSTARD
Code & data for ICLR 2024 spotlight paper: 🍯MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
Language: C++ - Size: 21 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 38 - Forks: 1

Gariscat/loopy
A data framework for music information retrieval focusing on electronic music.
Language: Python - Size: 1000 KB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 4

vkit-x/vkit
Boosting Document Intelligence
Language: Python - Size: 780 KB - Last synced at: 25 days ago - Pushed at: 27 days ago - Stars: 22 - Forks: 1

MatthewCYM/GenSE
Official implementaion of EMNLP 2022 paper "Generate, Discriminate, and Contrast: A Semi-Supervised Sentence Representation Learning Framework"
Language: Python - Size: 975 KB - Last synced at: 17 days ago - Pushed at: over 2 years ago - Stars: 22 - Forks: 1

jianqingzheng/def_diff_rec
[Preprint] Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis
Language: Python - Size: 307 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 20 - Forks: 1

phrocker/nifi-datasynthesizer
Apache NiFi Data Synthesizer
Language: Java - Size: 2.72 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 3

laanlabs/FootRenderer
A data synthesizer for creating datasets of feet from a first-person perspective.
Language: Swift - Size: 165 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 14 - Forks: 2

zealscott/LDPTrace
Source code for LDPTrace: Locally Differentially Private Trajectory Synthesis. VLDB 2023.
Language: Python - Size: 261 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 1

sushant1827/Trigger-Word-Detection
Coursera - RNN Programming Assignment: In this project, we will construct a speech dataset and implement an algorithm for trigger word detection (sometimes also called keyword detection, or wake word detection).
Language: Jupyter Notebook - Size: 29.7 MB - Last synced at: 16 days ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 6

ArenaGrenade/bpycv3d
Blender Python Package for extracting internal data from blender scenes for 3d related data generation purposes.
Language: Python - Size: 430 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

Smithsonian/CCN-Data-Library
The Coastal Carbon Network Data Library: An open-source database featuring carbon data from tidal wetlands around the world
Language: HTML - Size: 440 MB - Last synced at: 5 days ago - Pushed at: 11 days ago - Stars: 5 - Forks: 3

xuguodong1999/pen-simulator
data synthesis for simulation of pen-based interaction
Language: C++ - Size: 439 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

sebhaan/TabPFGen
TabPFGen: Synthetic Tabular Data Generation with TabPFN
Language: Python - Size: 65.4 KB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 3 - Forks: 0

EtienneChollet/oct_vesselseg
A Label-Free and Data-Free Synthesis Engine and Training Framework for Vascular Segmentation of sOCT Data with PyTorch.
Language: Python - Size: 35.2 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

PD-Mera/object-detection-data-synthesis
Synthesis data in YOLO format given background and object images
Language: Python - Size: 485 KB - Last synced at: about 2 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

KelestZ/ICW-GANs
FMRI data augmentation via synthesis, The IEEE International Symposium on Biomedical Imaging (ISBI'19)
Language: Python - Size: 5.82 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

EtienneChollet/SynthShapes
Generate Synthetic Shapes in 3D for Biomedical Image Augmentation and Synthesis.
Language: HTML - Size: 7.43 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

michelhilg/data-synthesis
This GitHub repository showcases my bachelor thesis which is focused on exploring the application and comparison of various deep generative models for synthetic image augmentation in manufacturing domain.
Language: Jupyter Notebook - Size: 20.8 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

hruffieux/echoseq
echoseq R package - Synthetic-data generator: replication and simulation of molecular and clinical data
Language: R - Size: 2.78 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

H-IAAC/sumo_data_synthesis
Repo for trying out SUMO
Language: Jupyter Notebook - Size: 90.3 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

mkleinegger/data-synthesizer-evaluation
Evaluation of different data synthesizers
Language: Jupyter Notebook - Size: 22.2 MB - Last synced at: 4 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

iWudao/Synthesizing-Realistic-Data-for-Table-Recognition
Releases for 「Synthesizing Realistic Data for Table Recognition」
Size: 5.86 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

ready4-dev/ready4web
Website of the ready4 suite of tools for data synthesis and modelling in mental health
Language: HTML - Size: 45.5 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

alexisfischer/ifcb-data-science
Build machine learning image classifiers and summarize large image datasets from the Imaging FlowCytobot (IFCB)
Language: MATLAB - Size: 322 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

alexisfischer/bloom-baby-bloom
Synthesize, analyze, and visualize biological oceanography data
Language: MATLAB - Size: 2.54 GB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 1

avishagnevo/VaccineMatchAnalysis
Comprehensive reproduction of the paper "BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Mass Vaccination Setting" by Noa Dagan, MD, et al., assisted by Professor Yair Goldberg. This statistical project explores vaccination's multifaceted impact on infection rates, employing synthetic data, advanced matching, and sophisticated statistical analysis.
Language: Python - Size: 3.41 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

homayounsrp/Sentiment-Classification-using-IMDB-reviews
For this project, I aimed to perform sentiment analysis on IMDB movie reviews. My dataset consisted of over 36,000 reviews, each accompanied by movie ratings ranging from 0 to 10. The primary objective was to construct a machine learning model capable of categorizing reviews into three sentiment classes: negative, neutral, and positive.
Language: Jupyter Notebook - Size: 865 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

pamudu123/BEE_counting
Counting Bees
Language: Jupyter Notebook - Size: 4.15 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

wbuchanan/sdpConvening2023
Repository for Slide Deck and Code Examples for talk at SDP Convening 2023
Language: HTML - Size: 2.52 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

lone17/DECAF
Data Utility Improvement Experiment for DECAF
Language: Python - Size: 330 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

JohannesWiesner/nisynth
A repository for synthesizing and simulating MRI images
Language: Python - Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0
