An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-synthesis"

AgaMiko/data-augmentation-review

List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.

Size: 3.59 MB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 1,632 - Forks: 207

Tebmer/Awesome-Knowledge-Distillation-of-LLMs

This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.

Size: 18.6 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 979 - Forks: 56

swz30/CycleISP

[CVPR 2020--Oral] CycleISP: Real Image Restoration via Improved Data Synthesis

Language: Python - Size: 12.1 MB - Last synced at: 14 days ago - Pushed at: 7 months ago - Stars: 528 - Forks: 75

DIYer22/bpycv

Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)

Language: Python - Size: 356 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 483 - Forks: 60

Flame-Code-VLM/Flame-Code-VLM

Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured training workflows to bridge the gap between design and front-end development.

Language: Python - Size: 7.24 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 470 - Forks: 28

MrGiovanni/SyntheticTumors

[CVPR 2023] Label-Free Liver Tumor Segmentation

Language: Python - Size: 75.2 MB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 345 - Forks: 27

OS-Copilot/OS-Genesis

Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Language: Jupyter Notebook - Size: 4.68 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 117 - Forks: 8

Xiaohao-Xu/SLAM-under-Perturbation

[ICLR 2025] Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video

Language: C++ - Size: 405 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 53 - Forks: 2

hewei2001/ReachQA

Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"

Language: Python - Size: 9.82 MB - Last synced at: 17 days ago - Pushed at: 6 months ago - Stars: 51 - Forks: 0

Baukebrenninkmeijer/On-the-Generation-and-Evaluation-of-Synthetic-Tabular-Data-using-GANs

Repository for the results of my master thesis, about the generation and evaluation of synthetic data using GANs

Language: Jupyter Notebook - Size: 48.1 MB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 42 - Forks: 4

cxcscmu/Montessori-Instruct

Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning [ICLR 2025]

Language: Python - Size: 27.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 40 - Forks: 3

Eleanor-H/MUSTARD

Code & data for ICLR 2024 spotlight paper: 🍯MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

Language: C++ - Size: 21 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 38 - Forks: 1

Gariscat/loopy

A data framework for music information retrieval focusing on electronic music.

Language: Python - Size: 1000 KB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 4

vkit-x/vkit

Boosting Document Intelligence

Language: Python - Size: 780 KB - Last synced at: 25 days ago - Pushed at: 27 days ago - Stars: 22 - Forks: 1

MatthewCYM/GenSE

Official implementaion of EMNLP 2022 paper "Generate, Discriminate, and Contrast: A Semi-Supervised Sentence Representation Learning Framework"

Language: Python - Size: 975 KB - Last synced at: 17 days ago - Pushed at: over 2 years ago - Stars: 22 - Forks: 1

jianqingzheng/def_diff_rec

[Preprint] Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis

Language: Python - Size: 307 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 20 - Forks: 1

phrocker/nifi-datasynthesizer

Apache NiFi Data Synthesizer

Language: Java - Size: 2.72 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 3

laanlabs/FootRenderer

A data synthesizer for creating datasets of feet from a first-person perspective.

Language: Swift - Size: 165 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 14 - Forks: 2

zealscott/LDPTrace

Source code for LDPTrace: Locally Differentially Private Trajectory Synthesis. VLDB 2023.

Language: Python - Size: 261 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 1

sushant1827/Trigger-Word-Detection

Coursera - RNN Programming Assignment: In this project, we will construct a speech dataset and implement an algorithm for trigger word detection (sometimes also called keyword detection, or wake word detection).

Language: Jupyter Notebook - Size: 29.7 MB - Last synced at: 16 days ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 6

ArenaGrenade/bpycv3d

Blender Python Package for extracting internal data from blender scenes for 3d related data generation purposes.

Language: Python - Size: 430 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

Smithsonian/CCN-Data-Library

The Coastal Carbon Network Data Library: An open-source database featuring carbon data from tidal wetlands around the world

Language: HTML - Size: 440 MB - Last synced at: 5 days ago - Pushed at: 11 days ago - Stars: 5 - Forks: 3

xuguodong1999/pen-simulator

data synthesis for simulation of pen-based interaction

Language: C++ - Size: 439 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

sebhaan/TabPFGen

TabPFGen: Synthetic Tabular Data Generation with TabPFN

Language: Python - Size: 65.4 KB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 3 - Forks: 0

EtienneChollet/oct_vesselseg

A Label-Free and Data-Free Synthesis Engine and Training Framework for Vascular Segmentation of sOCT Data with PyTorch.

Language: Python - Size: 35.2 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

PD-Mera/object-detection-data-synthesis

Synthesis data in YOLO format given background and object images

Language: Python - Size: 485 KB - Last synced at: about 2 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

KelestZ/ICW-GANs

FMRI data augmentation via synthesis, The IEEE International Symposium on Biomedical Imaging (ISBI'19)

Language: Python - Size: 5.82 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0

EtienneChollet/SynthShapes

Generate Synthetic Shapes in 3D for Biomedical Image Augmentation and Synthesis.

Language: HTML - Size: 7.43 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

michelhilg/data-synthesis

This GitHub repository showcases my bachelor thesis which is focused on exploring the application and comparison of various deep generative models for synthetic image augmentation in manufacturing domain.

Language: Jupyter Notebook - Size: 20.8 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

hruffieux/echoseq

echoseq R package - Synthetic-data generator: replication and simulation of molecular and clinical data

Language: R - Size: 2.78 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

H-IAAC/sumo_data_synthesis

Repo for trying out SUMO

Language: Jupyter Notebook - Size: 90.3 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

mkleinegger/data-synthesizer-evaluation

Evaluation of different data synthesizers

Language: Jupyter Notebook - Size: 22.2 MB - Last synced at: 4 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

iWudao/Synthesizing-Realistic-Data-for-Table-Recognition

Releases for 「Synthesizing Realistic Data for Table Recognition」

Size: 5.86 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

ready4-dev/ready4web

Website of the ready4 suite of tools for data synthesis and modelling in mental health

Language: HTML - Size: 45.5 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

alexisfischer/ifcb-data-science

Build machine learning image classifiers and summarize large image datasets from the Imaging FlowCytobot (IFCB)

Language: MATLAB - Size: 322 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

alexisfischer/bloom-baby-bloom

Synthesize, analyze, and visualize biological oceanography data

Language: MATLAB - Size: 2.54 GB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 1

avishagnevo/VaccineMatchAnalysis

Comprehensive reproduction of the paper "BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Mass Vaccination Setting" by Noa Dagan, MD, et al., assisted by Professor Yair Goldberg. This statistical project explores vaccination's multifaceted impact on infection rates, employing synthetic data, advanced matching, and sophisticated statistical analysis.

Language: Python - Size: 3.41 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

homayounsrp/Sentiment-Classification-using-IMDB-reviews

For this project, I aimed to perform sentiment analysis on IMDB movie reviews. My dataset consisted of over 36,000 reviews, each accompanied by movie ratings ranging from 0 to 10. The primary objective was to construct a machine learning model capable of categorizing reviews into three sentiment classes: negative, neutral, and positive.

Language: Jupyter Notebook - Size: 865 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

pamudu123/BEE_counting

Counting Bees

Language: Jupyter Notebook - Size: 4.15 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

wbuchanan/sdpConvening2023

Repository for Slide Deck and Code Examples for talk at SDP Convening 2023

Language: HTML - Size: 2.52 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

lone17/DECAF

Data Utility Improvement Experiment for DECAF

Language: Python - Size: 330 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

JohannesWiesner/nisynth

A repository for synthesizing and simulating MRI images

Language: Python - Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

Related Topics
machine-learning 8 computer-vision 6 deep-learning 5 python 4 data-augmentation 4 llm 4 data-generation 3 synthetic-data 3 tabular-data 3 data-science 3 simulation 2 blender-cv 2 signal-processing 2 blender 2 survey 2 gan 2 image-classification 2 generative-adversarial-network 2 ocr 2 image-augmentation 2 open-source 2 multimodal 2 synthetic-dataset-generation 2 dataset-generation 2 ocsr 1 reinforcement-learning-environments 1 xgboost 1 camera-imaging-pipeline 1 brain-imaging 1 cvpr2020 1 cycleisp 1 webscraping 1 pca 1 image-denoising 1 image-restoration 1 low-level-vision 1 pytorch 1 raw2rgb 1 rgb2raw 1 agents 1 gui 1 brain-mri 1 neuroscience 1 data-evaluation 1 generative-adversarial-networks 1 biomedical-image-segmentation 1 biomedical-imaging 1 instruction-tuning 1 nlp 1 mllm 1 keras-tensorflow 1 spectrogram 1 trigger-word-detection 1 routine-generator 1 sumo 1 traffic-simulation 1 crohme 1 crohme2023 1 digital-ink 1 handwriting-data 1 handwriting-generation 1 handwriting-simulator 1 handwriting-synthesis 1 instruct-gpt 1 feedback 1 instruction-following 1 kd 1 knowledge-distillation 1 large-language-model 1 multi-modal 1 self-distillation 1 self-training 1 supervised-finetuning 1 optical-coherence-tomography 1 representation-learning 1 audio-augmentation 1 augmentation-policies 1 autoaugment 1 data-augmentations 1 graph-data-augmentation 1 nlp-augmentation 1 review 1 style-transfer 1 coastal-carbon-network 1 wetland-science 1 bayesian 1 genai 1 tabpfn 1 benchmark 1 benchmarking 1 customizable 1 data-engine 1 gaussian-splatting 1 iclr2025 1 localization 1 mapping 1 nerf 1 perception 1 robustness 1 slam 1