An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: datasets-preparation

HiagoFF/oneclick-image-downloader-extension

Chrome extension to download images with one click, saving time on image dataset creation.

Language: JavaScript - Size: 167 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

Beejixx/oneclick-image-downloader-extension

Chrome extension to download images with one click, saving time on image dataset creation.

Language: JavaScript - Size: 167 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

Sytheflay1/oneclick-image-downloader-extension

Chrome extension to download images with one click, saving time on image dataset creation.

Size: 1.95 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

nmicovic/katachi

Katachi is a Python framework for validating and processing hierarchical directory structures using YAML-based schemas. It ensures your folders and files follow expected shapes, naming rules, and relationships—before any processing begins. Use it to enforce structure, catch issues early, and keep your data pipelines reliable.

Language: Python - Size: 2.54 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

AndyTheFactory/newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

Language: HTML - Size: 24.5 MB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 772 - Forks: 81

franklinkemta/oneclick-image-downloader-extension

Chrome extension to download images with one click, saving time on image dataset creation.

Language: JavaScript - Size: 4.02 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 1

edobranchi/PokeTCG_downloader

Pokemon card automatic images downloader

Language: Python - Size: 19.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

visual-layer/visuallayer

Simplify Your Visual Data Ops. Find and visualize issues with your computer vision datasets such as duplicates, anomalies, data leakage, mislabels and others.

Language: Jupyter Notebook - Size: 90.1 MB - Last synced at: 14 days ago - Pushed at: about 1 month ago - Stars: 69 - Forks: 3

karmazinoleh/hackEmotion-hackathon

Website for organising and collecting emotion datasets with smart system of validation

Language: Java - Size: 78.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

ProGamerGov/powershell-dataset-tools

Language: PowerShell - Size: 108 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

basillicus/traincraft

Atomic Dataset Generator for training ML potentials

Language: Python - Size: 105 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

reverseame/MALVADA

MALVADA: Malware Execution Traces Dataset generation.

Language: Python - Size: 37.4 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 3 - Forks: 2

birenkamdar/actigraphy

For actigraphy csv files downloaded from Philips devices. This STATA do file bulk imports, appends, and organizes variables from unlimited csv files to generate a clean file ready for analysis.

Language: Stata - Size: 15.6 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Sanyamjin/BRIDGE_ANAMOLY_DETECTION

Developed a Machine Learning Model for SpectoV for an internship second screening round. Generated a Dataset with temperature, strain , vibration as features and class anamoly.

Language: Jupyter Notebook - Size: 38.1 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

yevh/anonymizer

Anonymize sensitive data in your datasets.

Language: Python - Size: 1.16 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 1

sbl-sdsc/kg-import

kg-import automates the ingestion of heterogeneous datasets into a Knowledge Graph.

Language: Jupyter Notebook - Size: 902 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 4

nicolay-r/arekit-ss

Low Resource Context Relation Sampler for contexts with relations for fact-checking and fine-tuning your LLM models, powered by AREkit

Language: Python - Size: 2 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

hasanirtiza/Pedestron

[Pedestron] Generalizable Pedestrian Detection: The Elephant In The Room. @ CVPR2021

Language: Python - Size: 64.8 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 689 - Forks: 157

0ssamaak0/DLTA-AI

Data Labeling, Tracking and Annotation with AI

Language: Python - Size: 233 MB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 309 - Forks: 39

rishiswethan/ExtractSegmentationHabitat

This repo can help people having trouble with extracting segmentation images and masks from replica and matterport3d-habitat

Language: Python - Size: 17.6 KB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Dartvauder/NeuroTrainerWebUI

(Windows/Linux) Local WebUI for finetuning, evaluation and generation of neural network models (LLM and StableDiffusion) on python (In Gradio interface)

Language: Python - Size: 1.15 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

nicolay-r/SemEval2024-Task3

The supplementary sevice over THoR Chain-of-Thought framework as part of SemEval-2024 Task 3 paper

Language: Python - Size: 39.1 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

serp-ai/datasets

Datasets

Size: 6.67 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 7 - Forks: 1

vivesweb/row-math-ml-csv

Check row data from csv to extract number & percentage of emtpy, null, na, nan values, extract the type of the value (string, numeric, date, ip, emtpy, null, na, nan). Count(empty cols), percentage(empty cols), zeros values, ....

Language: PHP - Size: 174 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

vivesweb/cli-graph-ml

CLI PHP for visualize Machine learning datasets in Graph bar format. Detect Outliers. See your data before Training

Language: PHP - Size: 434 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

banda-larga/dataset-editor

Conversations / Instructions Editor

Language: Python - Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

stellar-gen-ai/stellar-dataset

Official Code for the dataset exploration of Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods

Language: Jupyter Notebook - Size: 991 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 1

AymenBOUGUERRA/Tool-for-making-noisy-images-and-their-masks-for-Unet-appliation-

While working on a Unet project, I created a program that can be used to add noise, a random grid (textbook) and a random shade of grey , this tool will output (depending on witch variation) combinations of two images the noisy image ut self and the clear one for the first variation (this one gave better results with Unet application) while the second variation will output the noisy image and the noise as its mask

Language: Python - Size: 5.86 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

lyooyl/AVADatasetMake

Make AVADataset custom dataset.

Language: Python - Size: 56 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

lucadiliello/asnq-challenging

ASNQ without trivial negative answers.

Language: Python - Size: 13.7 KB - Last synced at: 1 day ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

dennis-n-schneider/datasets

A Plugin for the judo project, enabling a reproducible way of dataset-management.

Language: Makefile - Size: 19.5 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

MostHappyCougar/HDF5ImageMarker

Utility to making datasets of images and points coordinates that have been marked up on these images by user

Language: Python - Size: 1.9 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

HSaurabh0919/tresta

Tresta contains Heuristics, Reinforcement Learning, Graph based Learning related Implementation

Language: Jupyter Notebook - Size: 1.19 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

FelosRG/Herramientas-Proyecto-Solar

Herramientas y librerías para la descarga y manipulación de datos satélitales del GOES-16 y datos de radiación solar. Así como un script para la generación de datasets con ambos tipos de datos para el entrenamiento de modelos de machine learning.

Language: Jupyter Notebook - Size: 38.4 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Praveen2795/Basic-Data-Analysis-Projects

This repository will contain multiple Data Analysis projects using Python as a programming language.

Language: Jupyter Notebook - Size: 14.9 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

amandascm/Databases-preprocessing

Widely known databases preprocessing in Python

Language: Python - Size: 2.14 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

marizombie/bing-images-downloader

Simple python app for Bing images download with help of Images Search API and Visual Search API, can be used for datasets preparing

Language: Python - Size: 19.5 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1