Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: dataset-generation
adithya-s-k/Topic2Dataset
💡Document-Instruct generates tailored instructions to rapidly adapt models to new docs.
Language: Jupyter Notebook - Size: 167 KB - Last synced: about 7 hours ago - Pushed: about 9 hours ago - Stars: 2 - Forks: 1
matissecallewaert/RustiFlow
Feature extraction tool build in Rust using eBPF for network intrusion detection
Language: Rust - Size: 6.86 MB - Last synced: about 7 hours ago - Pushed: about 8 hours ago - Stars: 5 - Forks: 0
Cozmeh/Weather-Data-Collector
This Python script collects weather data automatically using GitHub Actions. It runs on a scheduled job, collects the latest weather information of Bengaluru from the Open-Meteo API, and pushes the changes to the repository .
Language: Jupyter Notebook - Size: 329 KB - Last synced: about 22 hours ago - Pushed: about 23 hours ago - Stars: 0 - Forks: 1
blib-la/captain
Your all-in-one platform to build and use AI apps effortlessly on your own computer.
Language: TypeScript - Size: 144 MB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 13 - Forks: 2
tushar2704/common_datasets
Common-datasets is a GitHub repository dedicated to providing a wide collection of common datasets for practicing and learning data science and machine learning.
Language: Python - Size: 6.41 MB - Last synced: 1 day ago - Pushed: 11 months ago - Stars: 7 - Forks: 0
taexj/CV_Research_Work
Applying Deep Learning Techniques to determine fish feeding Status
Language: Jupyter Notebook - Size: 4.49 MB - Last synced: 1 day ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
PIC4SeR/MPOSE2021_Dataset
This repository contains the MPOSE2021 Dataset for short-time pose-based Human Action Recognition (HAR).
Language: Python - Size: 7.74 MB - Last synced: about 4 hours ago - Pushed: 6 months ago - Stars: 43 - Forks: 12
seart-group/ghs
GitHub Search: Platform used to crawl, store and present projects from GitHub, as well as any statistics related to them
Language: Java - Size: 40.2 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 115 - Forks: 13
RandomGamingDev/grabcraft-scraper
A little Python script made for scraping data from grabcraft, which can then be used for things like machine learning and data analysis projects and can be transformed to litematica files with https://github.com/RandomGamingDev/grabcraft-to-schema (Sadly, I can't release the dataset since you aren't allowed to share downloaded content)
Language: Python - Size: 109 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 1 - Forks: 0
HeegyuKim/open-korean-instructions
언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.
Language: Python - Size: 94.7 KB - Last synced: 18 days ago - Pushed: about 1 month ago - Stars: 257 - Forks: 19
joao-borrego/gap
Gazebo plugins for applying domain randomization
Language: C++ - Size: 14 MB - Last synced: 4 days ago - Pushed: over 5 years ago - Stars: 67 - Forks: 13
Erfaniaa/crypto-trading-strategy-backtester
Easy-to-use cryptocurrency trading strategy simulator and backtester
Language: Python - Size: 107 KB - Last synced: 1 day ago - Pushed: 8 months ago - Stars: 68 - Forks: 12
DIYer22/bpycv
Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
Language: Python - Size: 356 KB - Last synced: about 20 hours ago - Pushed: 2 months ago - Stars: 457 - Forks: 56
jim-schwoebel/download_audioset
📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).
Language: Python - Size: 154 MB - Last synced: 1 day ago - Pushed: 10 months ago - Stars: 95 - Forks: 22
Madjakul/HALph
[EN] A Half Text and Half Graph Dataset from a Digital Library.
Size: 5.86 KB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 0 - Forks: 0
avnCode/Topics_in_AI
We propose a novel evaluation technique for LLMs which surpasses BeRT based evaluation scores in terms of correlation with human evaluation scores
Language: Jupyter Notebook - Size: 169 KB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 0
Arron33/Automatic-Gold-Mine-Change-Detection
JavaScript program that uses machine learning and a new method of automatically generating training data to detect new artisanal gold mining activity.
Language: JavaScript - Size: 18.4 MB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 1 - Forks: 0
seart-group/DL4SE
Building Training Datasets for Deep Learning Models in Software Engineering
Language: Java - Size: 3.63 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 13 - Forks: 3
Madjakul/HALvesting
Harvests open research papers from HAL and parses it.
Language: Python - Size: 666 KB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0
GeorgeTouros/video-soundtrack-evaluation
Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.
Language: Jupyter Notebook - Size: 32.2 MB - Last synced: 10 days ago - Pushed: 10 months ago - Stars: 2 - Forks: 0
PrajjwalDatir/YT-GetDataSet
Yet Another Wrapper over YouTube Scrapper...
Language: JavaScript - Size: 191 KB - Last synced: 11 days ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0
TimeEval/GutenTAG
GutenTAG is an extensible tool to generate time series datasets with and without anomalies; integrated with TimeEval.
Language: Python - Size: 1.72 MB - Last synced: 12 days ago - Pushed: 13 days ago - Stars: 65 - Forks: 13
spraakbanken/mink-frontend
Vue frontend for Mink
Language: TypeScript - Size: 1.93 MB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 0 - Forks: 0
radi-cho/datasetGPT
A command-line interface to generate textual and conversational datasets with LLMs.
Language: Python - Size: 59.6 KB - Last synced: 10 days ago - Pushed: 9 months ago - Stars: 276 - Forks: 18
MaximumOverflow/Philia
An easy to use imageboard scraper.
Language: TypeScript - Size: 14.4 MB - Last synced: 14 days ago - Pushed: about 1 month ago - Stars: 24 - Forks: 1
SkywardAI/cecilia
EDA tools and datasets generator for ML projects
Size: 45.9 KB - Last synced: 11 days ago - Pushed: 15 days ago - Stars: 0 - Forks: 1
nalinrajendran/synthetic-LLM-QA-dataset-generator
Create synthetic datasets for training and testing Language Learning Models (LLMs) in a Question-Answering (QA) context.
Language: Python - Size: 7.81 KB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 0 - Forks: 0
davidmartinrius/speech-dataset-generator
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
Language: Python - Size: 5.01 MB - Last synced: 15 days ago - Pushed: 16 days ago - Stars: 135 - Forks: 14
QuantLet/DataGenerationForCausalInference
Generates synthetic data to apply simulations for causal inference
Language: R - Size: 963 KB - Last synced: 16 days ago - Pushed: about 5 years ago - Stars: 7 - Forks: 7
ylogx/aesthetics
Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader
Language: Python - Size: 4.17 MB - Last synced: 11 days ago - Pushed: 10 months ago - Stars: 207 - Forks: 54
facebookresearch/stopes
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
Language: Python - Size: 4.64 MB - Last synced: 7 days ago - Pushed: 5 months ago - Stars: 237 - Forks: 37
ISL-INTELLIGENT-SYSTEMS-LAB/Dataset_Class_Equalizer
This tool is designed to help balance class distributions in datasets, which is particularly useful for enhancing the performance of machine learning models affected by class imbalance.
Language: Python - Size: 142 KB - Last synced: 16 days ago - Pushed: 17 days ago - Stars: 0 - Forks: 0
BastienBoymond/tokyo-sharehouse-dataset
Scrapper that create a Dataset contains everyInformation of https://tokyosharehouse.com/eng/
Language: Python - Size: 42 KB - Last synced: 18 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
zenoverflow/datamaker-chatproxy
Proxy server that automatically stores messages exchanged between any OAI-compatible frontend and backend as a ShareGPT dataset to be used for training/finetuning.
Language: TypeScript - Size: 88.9 KB - Last synced: 19 days ago - Pushed: 19 days ago - Stars: 2 - Forks: 0
AlvaroCavalcante/auto_annotate
Labeling is boring. Use this tool to speed up your next object detection project!
Language: Jupyter Notebook - Size: 54.2 MB - Last synced: 18 days ago - Pushed: 4 months ago - Stars: 148 - Forks: 33
stefanDeveloper/heiFIP
heiFIP: A tool to convert network traffic into images for ML use cases
Language: Python - Size: 25.1 MB - Last synced: 21 days ago - Pushed: about 2 months ago - Stars: 9 - Forks: 2
LiuXinchen1997/Scan-Point-Cloud-Seg-Dataset
Scanning scene point cloud foreground and background segmentation dataset.
Language: Python - Size: 9.76 MB - Last synced: 21 days ago - Pushed: almost 2 years ago - Stars: 2 - Forks: 0
ServiceNow/synbols
The Synbols dataset generator is a ServiceNow Research project that was started at Element AI.
Language: Python - Size: 19 MB - Last synced: 21 days ago - Pushed: 10 months ago - Stars: 42 - Forks: 6
serpapi/serapis-ai-image-classifier
Automatic Image Classification from SERP Data
Language: Python - Size: 81.7 MB - Last synced: 18 days ago - Pushed: over 1 year ago - Stars: 25 - Forks: 1
wey-gu/fraud-detection-datagen
Fraud detection data generation with community structure, ready for NebulaGraph.
Language: Python - Size: 170 MB - Last synced: 23 days ago - Pushed: 23 days ago - Stars: 20 - Forks: 7
Kareem-Emad/youtube_metadata_scraper
An expansion over the Youtube-8m Dataset to get more data about the videos such likes/views and channel info through scrapping youtube
Language: Python - Size: 9.77 KB - Last synced: 25 days ago - Pushed: 12 months ago - Stars: 1 - Forks: 0
gongouveia/Whisper-Temple-Synthetic-ASR-Dataset-Generator
(Still not complete!!) This UI serves as a Synthetic ASR Dataset Generator powered by/for OpenAI Whisper, enabling users to capture audio, transcribing it, on the fly and manage the generated dataset. It provides a user-friendly interface for configuring audio parameters, transcription options, and dataset management.
Language: Python - Size: 1.48 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 9 - Forks: 0
Sid2697/HOI-Ref
Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"
Language: Python - Size: 6.73 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 5 - Forks: 0
SuryaKrishna02/sft-llm-news-articles-telugu
The repository contains the code that is used to create the instruct style dataset of telugu news articles.
Language: Jupyter Notebook - Size: 345 KB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 0 - Forks: 0
khirotaka/tartare 📦
Tartare: Make homebrew image dataset for machine learning.
Language: Python - Size: 2.74 MB - Last synced: 26 days ago - Pushed: almost 5 years ago - Stars: 1 - Forks: 0
debrief/KnimeInvestigation
Placeholder used to manage collection of tasks investigating applicability of Knime for ad-hoc data analysis of Debrief-like data.
Size: 632 KB - Last synced: 27 days ago - Pushed: about 7 years ago - Stars: 0 - Forks: 0
ZhangYuanhan-AI/Bamboo
Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.
Language: Python - Size: 5.41 MB - Last synced: 26 days ago - Pushed: about 1 month ago - Stars: 160 - Forks: 6
MatteoGuadrini/pyreports
pyreports is a python library that allows you to create complex report from various sources
Language: Python - Size: 4.75 MB - Last synced: 30 days ago - Pushed: about 1 month ago - Stars: 96 - Forks: 7
nfstream/nfstream
NFStream: a Flexible Network Data Analysis Framework.
Language: Python - Size: 115 MB - Last synced: 28 days ago - Pushed: 3 months ago - Stars: 1,039 - Forks: 117
packing-box/docker-packing-box
Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection
Language: Python - Size: 82.5 MB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 42 - Forks: 9
asaparov/prontoqa
Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.
Language: Python - Size: 36.5 MB - Last synced: 23 days ago - Pushed: 7 months ago - Stars: 88 - Forks: 9
futianfan/clinical-trial-outcome-prediction
benchmark dataset and Deep learning method (Hierarchical Interaction Network, HINT) for clinical trial approval probability prediction, published in Cell Patterns 2022.
Language: Python - Size: 102 MB - Last synced: 21 days ago - Pushed: 10 months ago - Stars: 84 - Forks: 21
satwikkottur/clevr-dialog
Repository to generate CLEVR-Dialog: A diagnostic dataset for Visual Dialog
Language: Python - Size: 39.1 KB - Last synced: about 1 month ago - Pushed: about 4 years ago - Stars: 44 - Forks: 2
navi3-research-group/extract-facepoints-dataset
Script that captures frames from the computer's webcam, tries to detect a face in the frame and then saves the normalized distance between the center of the face detected and 60 facepoints in a Dataframe that can be saved as a dataset.
Language: Python - Size: 196 KB - Last synced: about 1 month ago - Pushed: about 6 years ago - Stars: 0 - Forks: 2
deeplearningcafe/auto-instance-segmentation-dataset-generator
From a video, automatically create an Instance Segmentation dataset using Detectors like YoloX and Segment Anything
Language: Python - Size: 1.1 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
NisaarAgharia/Mass_Summarization
Large Scale Dataset Cleaning (Summarization and Information Extraction) Using LLAMA2 LLM
Language: Jupyter Notebook - Size: 59.6 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
rioharper/VocalForge
Your one-stop solution for voice dataset creation
Language: Python - Size: 45.8 MB - Last synced: 23 days ago - Pushed: 5 months ago - Stars: 96 - Forks: 11
rodrigopivi/Chatito
🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Language: TypeScript - Size: 6.42 MB - Last synced: 28 days ago - Pushed: 8 months ago - Stars: 861 - Forks: 157
deeplearningcafe/animespeechdataset
Dataset Generation for Language Model Training and Text-to-Speech Synthesis from Anime Subtitles
Language: Python - Size: 1.67 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 1 - Forks: 0
tklab-tud/ID2T
Official ID2T repository. ID2T creates labeled IT network datasets that contain user defined synthetic attacks.
Language: Python - Size: 29.2 MB - Last synced: 21 days ago - Pushed: 11 months ago - Stars: 51 - Forks: 22
mosesab/Categorize-News-Headlines-With-Word-Embeddings
A simple project that creates a dataset of News Headlines with Primary Category, Secondary Category, Date, Day, Month,Year, Sentiment, SentimentPolarity, Emotion and Url. All News Headlines are scraped from punch newspaper and sorted into a csv file.
Language: Python - Size: 709 KB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0
lyh983012/ES-imagenet-master
code for generating data set ES-ImageNet with corresponding training code
Language: Jupyter Notebook - Size: 373 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 25 - Forks: 2
kj3moraes/movieclip
An experiment with movie scenes and contrastive learning
Language: Jupyter Notebook - Size: 481 KB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 7 - Forks: 0
MainakRepositor/Diabetes-Prediction-System
Predict Diabetes and its possibility of occurrence from the pathological lab reports on your own.
Language: Python - Size: 552 KB - Last synced: 1 day ago - Pushed: 6 months ago - Stars: 23 - Forks: 9
AF011/Machine-Learning-Projects-Academic
Developing Supervised Learning Models Using pandas, numpy, sklearn, seaborn, matplotlib
Language: Jupyter Notebook - Size: 2.29 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 2 - Forks: 0
aitorzip/DeepGTAV
A plugin for GTAV that transforms it into a vision-based self-driving car research environment.
Language: C++ - Size: 72.2 MB - Last synced: about 1 month ago - Pushed: over 4 years ago - Stars: 1,095 - Forks: 274
nuhmanpk/Webtrench
A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code
Language: Python - Size: 51.8 KB - Last synced: 11 days ago - Pushed: 6 months ago - Stars: 20 - Forks: 5
Koldim2001/ML_DL_research_VECG
Проект ML/DL определения сердечной недостаточности по векторной электрокардиографии
Language: Jupyter Notebook - Size: 624 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
philipperemy/Facebook-Profile-Pictures-Downloader
:satisfied: Download public profile pictures from Facebook.
Language: Python - Size: 23.4 KB - Last synced: 10 days ago - Pushed: about 3 years ago - Stars: 26 - Forks: 12
pprp/voc2007_for_yolo_torch
:punch: Prepare VOC format datasets for ultralytics/yolov3 & yolov5
Language: Python - Size: 357 KB - Last synced: 15 days ago - Pushed: about 1 year ago - Stars: 190 - Forks: 57
msorkhpar/wiki-entity-summarization
This repository hosts a comprehensive suite for graph-based entity summarizations dataset generating from user-selected Wikipedia pages. Utilizing a series of interconnected modules, it leverages Wikidata and Wikipedia dumps to construct a dataset, alongside auto-generated ground truths.
Language: Python - Size: 251 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 14 - Forks: 1
rajadevineni/City_Description_Dataset_Generator
This kernal is created to collect data on many cities around the world to categorize them based on their descriptions. The list of cities is obtained from https://simplemaps.com as part of their free plan.
Language: Jupyter Notebook - Size: 11.5 MB - Last synced: about 2 months ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 0
JEF1056/clean-discord
Cleaning discord data for NLP
Language: Python - Size: 5.54 MB - Last synced: 11 days ago - Pushed: over 2 years ago - Stars: 23 - Forks: 1
satellite-image-deep-learning/annotation
Annotation of datasets for deep learning applied to satellite and aerial imagery
Size: 154 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 26 - Forks: 8
anlp-team/LTI_Neural_Navigator
"Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases" by Jiarui Li and Ye Yuan and Zehua Zhang
Language: HTML - Size: 32.3 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 7 - Forks: 2
langtech-bsc/InstruCAT-generation
InstruCAT-instruction-generation is a repository dedicated to the generation of template-based instructional datasets for Natural Language Processing (NLP) tasks in Catalan.
Language: Python - Size: 2.49 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
Simula-COMPLEX/DeepScenario
DeepScenario: An Open Driving Scenario Dataset for Autonomous Driving System Testing
Language: Python - Size: 314 MB - Last synced: about 2 months ago - Pushed: 4 months ago - Stars: 18 - Forks: 2
sam0x17/image_labeler
a utility for generating VOC image annotations
Language: HTML - Size: 28.3 KB - Last synced: about 2 months ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0
remyxai/VQASynth
Compose multimodal datasets 🎹
Language: Python - Size: 1.26 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 43 - Forks: 3
jefferickson/peer-object-matching 📦
Python prototype for https://github.com/jefferickson/peer-object-matcher
Language: Python - Size: 3.46 MB - Last synced: about 2 months ago - Pushed: about 8 years ago - Stars: 3 - Forks: 0
jefferickson/peer-object-matcher
Match objects on exact categorical data and nearest continuous data
Language: Go - Size: 4.23 MB - Last synced: about 2 months ago - Pushed: about 8 years ago - Stars: 0 - Forks: 0
jefferickson/county-dendist-map 📦
Creating an index measuring the "rurality" of counties in the contiguous United States
Language: HTML - Size: 33.8 MB - Last synced: about 2 months ago - Pushed: over 6 years ago - Stars: 0 - Forks: 0
jefferickson/county-city-driving-dist 📦
A dataset of driving distances for each county centroid to the nearest large city in the contiguous United States
Language: Python - Size: 16.7 MB - Last synced: about 2 months ago - Pushed: over 9 years ago - Stars: 0 - Forks: 0
Omarleel/Demiset
Generador de datasets de audio en buena calidad, útil para la generación de modelos RVC (Retrieval-Based Voice Conversion)
Language: Python - Size: 14.6 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
fjxmlzn/DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
Language: Python - Size: 67.4 KB - Last synced: about 2 months ago - Pushed: 6 months ago - Stars: 274 - Forks: 69
firmai/datagene
DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)
Language: Jupyter Notebook - Size: 1.12 MB - Last synced: 6 days ago - Pushed: over 2 years ago - Stars: 191 - Forks: 22
colddsam/ModeYOLO
ModeYOLO: Elevate image processing with this Python package. Seamlessly perform color space transformations, simplify dataset modification for deep learning, and leverage OpenCV and NumPy. Ideal for YOLO projects, computer vision tasks, and efficient machine learning workflows.
Language: Python - Size: 25.4 KB - Last synced: 11 days ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
SOM-Research/DescribeML
DescribeML is a Visual Studio Code language plug-in to describe machine-learning datasets in a structured format. Build better data describing the composition, provenance and social concerns of your dataset.
Language: TypeScript - Size: 96.5 MB - Last synced: 26 days ago - Pushed: 8 months ago - Stars: 26 - Forks: 2
FelixHertlein/inv3d-generator
Code to generate the Inv3D dataset from our paper "Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping" (ICDAR) 2023.
Language: HTML - Size: 40.9 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 10 - Forks: 0
pprzetacznik/patent-parsing-tools
USPTO patents dataset generator
Language: Python - Size: 1.69 MB - Last synced: about 2 months ago - Pushed: 8 months ago - Stars: 5 - Forks: 1
AgaMiko/pixel_character_generator
Generating retro pixel game characters with Generative Adversarial Networks. Dataset "TinyHero" included.
Language: Jupyter Notebook - Size: 20.3 MB - Last synced: 26 days ago - Pushed: almost 4 years ago - Stars: 115 - Forks: 10
atfortes/DataGenLM
Synthetic data generation for evaluating Large Language Models reasoning.
Language: Python - Size: 28.3 KB - Last synced: 26 days ago - Pushed: about 1 year ago - Stars: 4 - Forks: 0
DaWelter/face-3d-rotation-augmentation
Reproduction of the 3d rotation augmentation of the 300W-LP face pose data set
Language: Jupyter Notebook - Size: 17 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 3 - Forks: 0
SimGus/Chatette
A powerful dataset generator for Rasa NLU, inspired by Chatito
Language: Python - Size: 16.1 MB - Last synced: about 2 months ago - Pushed: over 2 years ago - Stars: 309 - Forks: 54
RoloEdits/scrapetoon
A tool for scraping information from Webtoons.
Language: Rust - Size: 7.48 MB - Last synced: 19 days ago - Pushed: over 1 year ago - Stars: 8 - Forks: 1
hearmeneigh/dataset-rising
Toolchain for creating custom datasets and training Stable Diffusion (1.x, 2.x, XL) models and LoRAs
Language: Python - Size: 234 KB - Last synced: 30 days ago - Pushed: 5 months ago - Stars: 11 - Forks: 0
arian-askari/SOLID
A dataset of Intent-Aware LLM-generated Information-Seeking Dialogues useful for various tasks such as training/evaluating User Intent Predictors with the possibility to training/evaluating on real human dialogues. The backbone LLM of SOLID is Zephyr-7b-beta.
Language: Python - Size: 30.6 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
realm-tech/docgen
A document generator used to fully create training and evaluation datasets for OCR applications
Language: Python - Size: 32.5 MB - Last synced: 3 months ago - Pushed: 9 months ago - Stars: 1 - Forks: 0
scalexi/scalexi
scalexi is a versatile open-source Python library, optimized for Python 3.11+, focuses on facilitating low-code development and fine-tuning of diverse Large Language Models (LLMs).
Language: Python - Size: 31.2 MB - Last synced: 4 days ago - Pushed: about 1 month ago - Stars: 11 - Forks: 1
CDInstitute/Building-Dataset-Generator
Procedural 3D data generation pipeline for architecture
Language: Python - Size: 174 MB - Last synced: about 2 months ago - Pushed: over 2 years ago - Stars: 67 - Forks: 14