Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: dataset-generation

adithya-s-k/Topic2Dataset

💡Document-Instruct generates tailored instructions to rapidly adapt models to new docs.

Language: Jupyter Notebook - Size: 167 KB - Last synced: about 7 hours ago - Pushed: about 9 hours ago - Stars: 2 - Forks: 1

matissecallewaert/RustiFlow

Feature extraction tool build in Rust using eBPF for network intrusion detection

Language: Rust - Size: 6.86 MB - Last synced: about 7 hours ago - Pushed: about 8 hours ago - Stars: 5 - Forks: 0

Cozmeh/Weather-Data-Collector

This Python script collects weather data automatically using GitHub Actions. It runs on a scheduled job, collects the latest weather information of Bengaluru from the Open-Meteo API, and pushes the changes to the repository .

Language: Jupyter Notebook - Size: 329 KB - Last synced: about 22 hours ago - Pushed: about 23 hours ago - Stars: 0 - Forks: 1

blib-la/captain

Your all-in-one platform to build and use AI apps effortlessly on your own computer.

Language: TypeScript - Size: 144 MB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 13 - Forks: 2

tushar2704/common_datasets

Common-datasets is a GitHub repository dedicated to providing a wide collection of common datasets for practicing and learning data science and machine learning.

Language: Python - Size: 6.41 MB - Last synced: 1 day ago - Pushed: 11 months ago - Stars: 7 - Forks: 0

taexj/CV_Research_Work

Applying Deep Learning Techniques to determine fish feeding Status

Language: Jupyter Notebook - Size: 4.49 MB - Last synced: 1 day ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

PIC4SeR/MPOSE2021_Dataset

This repository contains the MPOSE2021 Dataset for short-time pose-based Human Action Recognition (HAR).

Language: Python - Size: 7.74 MB - Last synced: about 4 hours ago - Pushed: 6 months ago - Stars: 43 - Forks: 12

seart-group/ghs

GitHub Search: Platform used to crawl, store and present projects from GitHub, as well as any statistics related to them

Language: Java - Size: 40.2 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 115 - Forks: 13

RandomGamingDev/grabcraft-scraper

A little Python script made for scraping data from grabcraft, which can then be used for things like machine learning and data analysis projects and can be transformed to litematica files with https://github.com/RandomGamingDev/grabcraft-to-schema (Sadly, I can't release the dataset since you aren't allowed to share downloaded content)

Language: Python - Size: 109 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 1 - Forks: 0

HeegyuKim/open-korean-instructions

언어모델을 학습하기 위한 공개 한국어 instruction dataset들을 모아두었습니다.

Language: Python - Size: 94.7 KB - Last synced: 18 days ago - Pushed: about 1 month ago - Stars: 257 - Forks: 19

joao-borrego/gap

Gazebo plugins for applying domain randomization

Language: C++ - Size: 14 MB - Last synced: 4 days ago - Pushed: over 5 years ago - Stars: 67 - Forks: 13

Erfaniaa/crypto-trading-strategy-backtester

Easy-to-use cryptocurrency trading strategy simulator and backtester

Language: Python - Size: 107 KB - Last synced: 1 day ago - Pushed: 8 months ago - Stars: 68 - Forks: 12

DIYer22/bpycv

Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)

Language: Python - Size: 356 KB - Last synced: about 20 hours ago - Pushed: 2 months ago - Stars: 457 - Forks: 56

jim-schwoebel/download_audioset

📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).

Language: Python - Size: 154 MB - Last synced: 1 day ago - Pushed: 10 months ago - Stars: 95 - Forks: 22

Madjakul/HALph

[EN] A Half Text and Half Graph Dataset from a Digital Library.

Size: 5.86 KB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 0 - Forks: 0

avnCode/Topics_in_AI

We propose a novel evaluation technique for LLMs which surpasses BeRT based evaluation scores in terms of correlation with human evaluation scores

Language: Jupyter Notebook - Size: 169 KB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 0

Arron33/Automatic-Gold-Mine-Change-Detection

JavaScript program that uses machine learning and a new method of automatically generating training data to detect new artisanal gold mining activity.

Language: JavaScript - Size: 18.4 MB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 1 - Forks: 0

seart-group/DL4SE

Building Training Datasets for Deep Learning Models in Software Engineering

Language: Java - Size: 3.63 MB - Last synced: 10 days ago - Pushed: 11 days ago - Stars: 13 - Forks: 3

Madjakul/HALvesting

Harvests open research papers from HAL and parses it.

Language: Python - Size: 666 KB - Last synced: 10 days ago - Pushed: 10 days ago - Stars: 0 - Forks: 0

GeorgeTouros/video-soundtrack-evaluation

Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.

Language: Jupyter Notebook - Size: 32.2 MB - Last synced: 10 days ago - Pushed: 10 months ago - Stars: 2 - Forks: 0

PrajjwalDatir/YT-GetDataSet

Yet Another Wrapper over YouTube Scrapper...

Language: JavaScript - Size: 191 KB - Last synced: 11 days ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

TimeEval/GutenTAG

GutenTAG is an extensible tool to generate time series datasets with and without anomalies; integrated with TimeEval.

Language: Python - Size: 1.72 MB - Last synced: 12 days ago - Pushed: 13 days ago - Stars: 65 - Forks: 13

spraakbanken/mink-frontend

Vue frontend for Mink

Language: TypeScript - Size: 1.93 MB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 0 - Forks: 0

radi-cho/datasetGPT

A command-line interface to generate textual and conversational datasets with LLMs.

Language: Python - Size: 59.6 KB - Last synced: 10 days ago - Pushed: 9 months ago - Stars: 276 - Forks: 18

MaximumOverflow/Philia

An easy to use imageboard scraper.

Language: TypeScript - Size: 14.4 MB - Last synced: 14 days ago - Pushed: about 1 month ago - Stars: 24 - Forks: 1

SkywardAI/cecilia

EDA tools and datasets generator for ML projects

Size: 45.9 KB - Last synced: 11 days ago - Pushed: 15 days ago - Stars: 0 - Forks: 1

nalinrajendran/synthetic-LLM-QA-dataset-generator

Create synthetic datasets for training and testing Language Learning Models (LLMs) in a Question-Answering (QA) context.

Language: Python - Size: 7.81 KB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 0 - Forks: 0

davidmartinrius/speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.

Language: Python - Size: 5.01 MB - Last synced: 15 days ago - Pushed: 16 days ago - Stars: 135 - Forks: 14

QuantLet/DataGenerationForCausalInference

Generates synthetic data to apply simulations for causal inference

Language: R - Size: 963 KB - Last synced: 16 days ago - Pushed: about 5 years ago - Stars: 7 - Forks: 7

ylogx/aesthetics

Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader

Language: Python - Size: 4.17 MB - Last synced: 11 days ago - Pushed: 10 months ago - Stars: 207 - Forks: 54

facebookresearch/stopes

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

Language: Python - Size: 4.64 MB - Last synced: 7 days ago - Pushed: 5 months ago - Stars: 237 - Forks: 37

ISL-INTELLIGENT-SYSTEMS-LAB/Dataset_Class_Equalizer

This tool is designed to help balance class distributions in datasets, which is particularly useful for enhancing the performance of machine learning models affected by class imbalance.

Language: Python - Size: 142 KB - Last synced: 16 days ago - Pushed: 17 days ago - Stars: 0 - Forks: 0

BastienBoymond/tokyo-sharehouse-dataset

Scrapper that create a Dataset contains everyInformation of https://tokyosharehouse.com/eng/

Language: Python - Size: 42 KB - Last synced: 18 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

zenoverflow/datamaker-chatproxy

Proxy server that automatically stores messages exchanged between any OAI-compatible frontend and backend as a ShareGPT dataset to be used for training/finetuning.

Language: TypeScript - Size: 88.9 KB - Last synced: 19 days ago - Pushed: 19 days ago - Stars: 2 - Forks: 0

AlvaroCavalcante/auto_annotate

Labeling is boring. Use this tool to speed up your next object detection project!

Language: Jupyter Notebook - Size: 54.2 MB - Last synced: 18 days ago - Pushed: 4 months ago - Stars: 148 - Forks: 33

stefanDeveloper/heiFIP

heiFIP: A tool to convert network traffic into images for ML use cases

Language: Python - Size: 25.1 MB - Last synced: 21 days ago - Pushed: about 2 months ago - Stars: 9 - Forks: 2

LiuXinchen1997/Scan-Point-Cloud-Seg-Dataset

Scanning scene point cloud foreground and background segmentation dataset.

Language: Python - Size: 9.76 MB - Last synced: 21 days ago - Pushed: almost 2 years ago - Stars: 2 - Forks: 0

ServiceNow/synbols

The Synbols dataset generator is a ServiceNow Research project that was started at Element AI.

Language: Python - Size: 19 MB - Last synced: 21 days ago - Pushed: 10 months ago - Stars: 42 - Forks: 6

serpapi/serapis-ai-image-classifier

Automatic Image Classification from SERP Data

Language: Python - Size: 81.7 MB - Last synced: 18 days ago - Pushed: over 1 year ago - Stars: 25 - Forks: 1

wey-gu/fraud-detection-datagen

Fraud detection data generation with community structure, ready for NebulaGraph.

Language: Python - Size: 170 MB - Last synced: 23 days ago - Pushed: 23 days ago - Stars: 20 - Forks: 7

Kareem-Emad/youtube_metadata_scraper

An expansion over the Youtube-8m Dataset to get more data about the videos such likes/views and channel info through scrapping youtube

Language: Python - Size: 9.77 KB - Last synced: 25 days ago - Pushed: 12 months ago - Stars: 1 - Forks: 0

gongouveia/Whisper-Temple-Synthetic-ASR-Dataset-Generator

(Still not complete!!) This UI serves as a Synthetic ASR Dataset Generator powered by/for OpenAI Whisper, enabling users to capture audio, transcribing it, on the fly and manage the generated dataset. It provides a user-friendly interface for configuring audio parameters, transcription options, and dataset management.

Language: Python - Size: 1.48 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 9 - Forks: 0

Sid2697/HOI-Ref

Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"

Language: Python - Size: 6.73 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 5 - Forks: 0

SuryaKrishna02/sft-llm-news-articles-telugu

The repository contains the code that is used to create the instruct style dataset of telugu news articles.

Language: Jupyter Notebook - Size: 345 KB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 0 - Forks: 0

khirotaka/tartare 📦

Tartare: Make homebrew image dataset for machine learning.

Language: Python - Size: 2.74 MB - Last synced: 26 days ago - Pushed: almost 5 years ago - Stars: 1 - Forks: 0

debrief/KnimeInvestigation

Placeholder used to manage collection of tasks investigating applicability of Knime for ad-hoc data analysis of Debrief-like data.

Size: 632 KB - Last synced: 27 days ago - Pushed: about 7 years ago - Stars: 0 - Forks: 0

ZhangYuanhan-AI/Bamboo

Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.

Language: Python - Size: 5.41 MB - Last synced: 26 days ago - Pushed: about 1 month ago - Stars: 160 - Forks: 6

MatteoGuadrini/pyreports

pyreports is a python library that allows you to create complex report from various sources

Language: Python - Size: 4.75 MB - Last synced: 30 days ago - Pushed: about 1 month ago - Stars: 96 - Forks: 7

nfstream/nfstream

NFStream: a Flexible Network Data Analysis Framework.

Language: Python - Size: 115 MB - Last synced: 28 days ago - Pushed: 3 months ago - Stars: 1,039 - Forks: 117

packing-box/docker-packing-box

Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection

Language: Python - Size: 82.5 MB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 42 - Forks: 9

asaparov/prontoqa

Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.

Language: Python - Size: 36.5 MB - Last synced: 23 days ago - Pushed: 7 months ago - Stars: 88 - Forks: 9

futianfan/clinical-trial-outcome-prediction

benchmark dataset and Deep learning method (Hierarchical Interaction Network, HINT) for clinical trial approval probability prediction, published in Cell Patterns 2022.

Language: Python - Size: 102 MB - Last synced: 21 days ago - Pushed: 10 months ago - Stars: 84 - Forks: 21

satwikkottur/clevr-dialog

Repository to generate CLEVR-Dialog: A diagnostic dataset for Visual Dialog

Language: Python - Size: 39.1 KB - Last synced: about 1 month ago - Pushed: about 4 years ago - Stars: 44 - Forks: 2

navi3-research-group/extract-facepoints-dataset

Script that captures frames from the computer's webcam, tries to detect a face in the frame and then saves the normalized distance between the center of the face detected and 60 facepoints in a Dataframe that can be saved as a dataset.

Language: Python - Size: 196 KB - Last synced: about 1 month ago - Pushed: about 6 years ago - Stars: 0 - Forks: 2

deeplearningcafe/auto-instance-segmentation-dataset-generator

From a video, automatically create an Instance Segmentation dataset using Detectors like YoloX and Segment Anything

Language: Python - Size: 1.1 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

NisaarAgharia/Mass_Summarization

Large Scale Dataset Cleaning (Summarization and Information Extraction) Using LLAMA2 LLM

Language: Jupyter Notebook - Size: 59.6 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

rioharper/VocalForge

Your one-stop solution for voice dataset creation

Language: Python - Size: 45.8 MB - Last synced: 23 days ago - Pushed: 5 months ago - Stars: 96 - Forks: 11

rodrigopivi/Chatito

🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!

Language: TypeScript - Size: 6.42 MB - Last synced: 28 days ago - Pushed: 8 months ago - Stars: 861 - Forks: 157

deeplearningcafe/animespeechdataset

Dataset Generation for Language Model Training and Text-to-Speech Synthesis from Anime Subtitles

Language: Python - Size: 1.67 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 1 - Forks: 0

tklab-tud/ID2T

Official ID2T repository. ID2T creates labeled IT network datasets that contain user defined synthetic attacks.

Language: Python - Size: 29.2 MB - Last synced: 21 days ago - Pushed: 11 months ago - Stars: 51 - Forks: 22

mosesab/Categorize-News-Headlines-With-Word-Embeddings

A simple project that creates a dataset of News Headlines with Primary Category, Secondary Category, Date, Day, Month,Year, Sentiment, SentimentPolarity, Emotion and Url. All News Headlines are scraped from punch newspaper and sorted into a csv file.

Language: Python - Size: 709 KB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0

lyh983012/ES-imagenet-master

code for generating data set ES-ImageNet with corresponding training code

Language: Jupyter Notebook - Size: 373 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 25 - Forks: 2

kj3moraes/movieclip

An experiment with movie scenes and contrastive learning

Language: Jupyter Notebook - Size: 481 KB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 7 - Forks: 0

MainakRepositor/Diabetes-Prediction-System

Predict Diabetes and its possibility of occurrence from the pathological lab reports on your own.

Language: Python - Size: 552 KB - Last synced: 1 day ago - Pushed: 6 months ago - Stars: 23 - Forks: 9

AF011/Machine-Learning-Projects-Academic

Developing Supervised Learning Models Using pandas, numpy, sklearn, seaborn, matplotlib

Language: Jupyter Notebook - Size: 2.29 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 2 - Forks: 0

aitorzip/DeepGTAV

A plugin for GTAV that transforms it into a vision-based self-driving car research environment.

Language: C++ - Size: 72.2 MB - Last synced: about 1 month ago - Pushed: over 4 years ago - Stars: 1,095 - Forks: 274

nuhmanpk/Webtrench

A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code

Language: Python - Size: 51.8 KB - Last synced: 11 days ago - Pushed: 6 months ago - Stars: 20 - Forks: 5

Koldim2001/ML_DL_research_VECG

Проект ML/DL определения сердечной недостаточности по векторной электрокардиографии

Language: Jupyter Notebook - Size: 624 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

philipperemy/Facebook-Profile-Pictures-Downloader

:satisfied: Download public profile pictures from Facebook.

Language: Python - Size: 23.4 KB - Last synced: 10 days ago - Pushed: about 3 years ago - Stars: 26 - Forks: 12

pprp/voc2007_for_yolo_torch

:punch: Prepare VOC format datasets for ultralytics/yolov3 & yolov5

Language: Python - Size: 357 KB - Last synced: 15 days ago - Pushed: about 1 year ago - Stars: 190 - Forks: 57

msorkhpar/wiki-entity-summarization

This repository hosts a comprehensive suite for graph-based entity summarizations dataset generating from user-selected Wikipedia pages. Utilizing a series of interconnected modules, it leverages Wikidata and Wikipedia dumps to construct a dataset, alongside auto-generated ground truths.

Language: Python - Size: 251 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 14 - Forks: 1

rajadevineni/City_Description_Dataset_Generator

This kernal is created to collect data on many cities around the world to categorize them based on their descriptions. The list of cities is obtained from https://simplemaps.com as part of their free plan.

Language: Jupyter Notebook - Size: 11.5 MB - Last synced: about 2 months ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 0

JEF1056/clean-discord

Cleaning discord data for NLP

Language: Python - Size: 5.54 MB - Last synced: 11 days ago - Pushed: over 2 years ago - Stars: 23 - Forks: 1

satellite-image-deep-learning/annotation

Annotation of datasets for deep learning applied to satellite and aerial imagery

Size: 154 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 26 - Forks: 8

anlp-team/LTI_Neural_Navigator

"Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases" by Jiarui Li and Ye Yuan and Zehua Zhang

Language: HTML - Size: 32.3 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 7 - Forks: 2

langtech-bsc/InstruCAT-generation

InstruCAT-instruction-generation is a repository dedicated to the generation of template-based instructional datasets for Natural Language Processing (NLP) tasks in Catalan.

Language: Python - Size: 2.49 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

Simula-COMPLEX/DeepScenario

DeepScenario: An Open Driving Scenario Dataset for Autonomous Driving System Testing

Language: Python - Size: 314 MB - Last synced: about 2 months ago - Pushed: 4 months ago - Stars: 18 - Forks: 2

sam0x17/image_labeler

a utility for generating VOC image annotations

Language: HTML - Size: 28.3 KB - Last synced: about 2 months ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0

remyxai/VQASynth

Compose multimodal datasets 🎹

Language: Python - Size: 1.26 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 43 - Forks: 3

jefferickson/peer-object-matching 📦

Python prototype for https://github.com/jefferickson/peer-object-matcher

Language: Python - Size: 3.46 MB - Last synced: about 2 months ago - Pushed: about 8 years ago - Stars: 3 - Forks: 0

jefferickson/peer-object-matcher

Match objects on exact categorical data and nearest continuous data

Language: Go - Size: 4.23 MB - Last synced: about 2 months ago - Pushed: about 8 years ago - Stars: 0 - Forks: 0

jefferickson/county-dendist-map 📦

Creating an index measuring the "rurality" of counties in the contiguous United States

Language: HTML - Size: 33.8 MB - Last synced: about 2 months ago - Pushed: over 6 years ago - Stars: 0 - Forks: 0

jefferickson/county-city-driving-dist 📦

A dataset of driving distances for each county centroid to the nearest large city in the contiguous United States

Language: Python - Size: 16.7 MB - Last synced: about 2 months ago - Pushed: over 9 years ago - Stars: 0 - Forks: 0

Omarleel/Demiset

Generador de datasets de audio en buena calidad, útil para la generación de modelos RVC (Retrieval-Based Voice Conversion)

Language: Python - Size: 14.6 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

fjxmlzn/DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

Language: Python - Size: 67.4 KB - Last synced: about 2 months ago - Pushed: 6 months ago - Stars: 274 - Forks: 69

firmai/datagene

DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)

Language: Jupyter Notebook - Size: 1.12 MB - Last synced: 6 days ago - Pushed: over 2 years ago - Stars: 191 - Forks: 22

colddsam/ModeYOLO

ModeYOLO: Elevate image processing with this Python package. Seamlessly perform color space transformations, simplify dataset modification for deep learning, and leverage OpenCV and NumPy. Ideal for YOLO projects, computer vision tasks, and efficient machine learning workflows.

Language: Python - Size: 25.4 KB - Last synced: 11 days ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

SOM-Research/DescribeML

DescribeML is a Visual Studio Code language plug-in to describe machine-learning datasets in a structured format. Build better data describing the composition, provenance and social concerns of your dataset.

Language: TypeScript - Size: 96.5 MB - Last synced: 26 days ago - Pushed: 8 months ago - Stars: 26 - Forks: 2

FelixHertlein/inv3d-generator

Code to generate the Inv3D dataset from our paper "Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping" (ICDAR) 2023.

Language: HTML - Size: 40.9 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 10 - Forks: 0

pprzetacznik/patent-parsing-tools

USPTO patents dataset generator

Language: Python - Size: 1.69 MB - Last synced: about 2 months ago - Pushed: 8 months ago - Stars: 5 - Forks: 1

AgaMiko/pixel_character_generator

Generating retro pixel game characters with Generative Adversarial Networks. Dataset "TinyHero" included.

Language: Jupyter Notebook - Size: 20.3 MB - Last synced: 26 days ago - Pushed: almost 4 years ago - Stars: 115 - Forks: 10

atfortes/DataGenLM

Synthetic data generation for evaluating Large Language Models reasoning.

Language: Python - Size: 28.3 KB - Last synced: 26 days ago - Pushed: about 1 year ago - Stars: 4 - Forks: 0

DaWelter/face-3d-rotation-augmentation

Reproduction of the 3d rotation augmentation of the 300W-LP face pose data set

Language: Jupyter Notebook - Size: 17 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 3 - Forks: 0

SimGus/Chatette

A powerful dataset generator for Rasa NLU, inspired by Chatito

Language: Python - Size: 16.1 MB - Last synced: about 2 months ago - Pushed: over 2 years ago - Stars: 309 - Forks: 54

RoloEdits/scrapetoon

A tool for scraping information from Webtoons.

Language: Rust - Size: 7.48 MB - Last synced: 19 days ago - Pushed: over 1 year ago - Stars: 8 - Forks: 1

hearmeneigh/dataset-rising

Toolchain for creating custom datasets and training Stable Diffusion (1.x, 2.x, XL) models and LoRAs

Language: Python - Size: 234 KB - Last synced: 30 days ago - Pushed: 5 months ago - Stars: 11 - Forks: 0

arian-askari/SOLID

A dataset of Intent-Aware LLM-generated Information-Seeking Dialogues useful for various tasks such as training/evaluating User Intent Predictors with the possibility to training/evaluating on real human dialogues. The backbone LLM of SOLID is Zephyr-7b-beta.

Language: Python - Size: 30.6 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

realm-tech/docgen

A document generator used to fully create training and evaluation datasets for OCR applications

Language: Python - Size: 32.5 MB - Last synced: 3 months ago - Pushed: 9 months ago - Stars: 1 - Forks: 0

scalexi/scalexi

scalexi is a versatile open-source Python library, optimized for Python 3.11+, focuses on facilitating low-code development and fine-tuning of diverse Large Language Models (LLMs).

Language: Python - Size: 31.2 MB - Last synced: 4 days ago - Pushed: about 1 month ago - Stars: 11 - Forks: 1

CDInstitute/Building-Dataset-Generator

Procedural 3D data generation pipeline for architecture

Language: Python - Size: 174 MB - Last synced: about 2 months ago - Pushed: over 2 years ago - Stars: 67 - Forks: 14