GitHub topics: huggingface-datasets

Repositories

Ashu708907/Music-Genre-Classification-using-Spectrogram-images

🎵 Classify music genres by analyzing spectrogram images with machine learning and deep learning methods for robust and interpretable predictions.

Language: Jupyter Notebook - Size: 6.94 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

antoinejeannot/jurisprudence

French Jurisprudences at your fingertips @ every 72h

Language: Python - Size: 149 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 14 - Forks: 2

defeat-beta/defeatbeta-api

An open-source alternative to Yahoo Finance's market data APIs with higher reliability.

Language: Python - Size: 3.89 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 301 - Forks: 21

songys/Japanese-HF-datasets-catalog

Detection and automatic updating of Japanese datasets uploaded to Hugging Face

Language: Python - Size: 5.9 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

songys/Korean-HF-datasets-catalog

Detection and automatic updating of Korean datasets uploaded to Hugging Face

Language: Python - Size: 10.5 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 12 - Forks: 1

songys/Chinese-HF-datasets-catalog

Detection and automatic updating of Chinese datasets uploaded to Hugging Face

Language: Python - Size: 7.75 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

PRITHIVSAKTHIUR/FineTuning-MetaCLIP-2

This demonstrates the process of adapting a large scale pretrained model, MetaCLIP 2, for fine tuning a specific downstream task: image classification.

Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

lukehinds/deepfabric

Training Model Behavior in Agentic Systems

Language: Python - Size: 23.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 654 - Forks: 45

paulohl/hugging_face_diffusers_manuscript

Hugging Face Diffusers 🤗 Library book repo *BP Publishers

Language: TeX - Size: 133 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

omarkamali/wikisets

Flexible Wikipedia dataset builder with sampling and pretraining support. Built on top of wikipedia-monthly, providing fresh, clean Wikipedia dumps updated monthly.

Language: Python - Size: 225 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

huggingface/pyspark_huggingface

PySpark custom data source for Hugging Face Datasets

Language: Python - Size: 219 KB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 19 - Forks: 6

Moenupa/DeOCR

A reverse OCR tool that renders huggingface-compatible datasets to configurable images

Language: Python - Size: 915 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1 - Forks: 0

AyushShahh/image-colorization

UNet model to colorize black & white images

Language: Python - Size: 910 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

DSYZayn/gopeed-extension-huggingface

A gopeed-extension for downloading models and datasets from huggingface, hf-mirror and modelscope. Huggingface download

Language: JavaScript - Size: 1.67 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 60 - Forks: 3

grok-ai/nn-template

Generic template to bootstrap your PyTorch project.

Language: Python - Size: 2.68 MB - Last synced at: 18 days ago - Pushed at: about 2 years ago - Stars: 648 - Forks: 68

autogluon/fev

Forecast evaluation library

Language: Python - Size: 1.71 MB - Last synced at: 23 days ago - Pushed at: 29 days ago - Stars: 127 - Forks: 10

joe0731/hf_vram_calc

A CLI tool for estimating GPU VRAM requirements for Hugging Face models, supporting various data types, parallelization strategies, and fine-tuning scenarios like LoRA.

Language: Python - Size: 232 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 2

nirmal2i43a5/AI-Powered-Biomedical-NER-with-BioBERT

This project applies Fine-tuning BERT & BioBERT on BC5CDR for biomedical named entity recognition (diseases + chemicals).

Language: Jupyter Notebook - Size: 1.93 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0

The-Data-Dilemma/Medibeng-Orpheus-3b-0.1-ft-Fine-Tuning

Medibeng-Orpheus-3b-0.1-ft- A TTS model for bilingual Bengali-English code-switching in healthcare, fine-tuned for seamless patient-doctor interactions.

Language: Python - Size: 939 KB - Last synced at: 25 days ago - Pushed at: 4 months ago - Stars: 5 - Forks: 1

Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, backup, re-embed (using any model) or access your vector data from any vector databases or repository.

Language: Jupyter Notebook - Size: 4.39 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 263 - Forks: 29

vincentkoc/tiny_qa_benchmark_pp

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

Language: Python - Size: 310 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

Gift-Ojeabulu/fasttext-language-detection

Simple multilingual text detection using FastText and Hugging Face datasets. Production-ready Python library with real-world examples.

Language: Python - Size: 8.79 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

raidionics/AeroPath

:hugs: AeroPath: An airway segmentation benchmark dataset with challenging pathology

Language: Jupyter Notebook - Size: 324 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 40 - Forks: 6

rolim520/Nine-Tiles-Panic-Solver

Exhaustive solver for the board game Nine Tiles Panic. This project generates and analyzes all 2.9 billion valid layouts using Python & DuckDB to find the single optimal solution for every scoring combination. Features an interactive web visualizer.

Language: Python - Size: 39.8 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

shashwatpasari/Music-Genre-Classification-using-Spectrogram-images

This repository provides a comprehensive suite of machine learning and deep learning approaches for hierarchical music genre classification using spectrogram images. It includes models built with EfficientNet, Audio Spectrogram Transformer (AST), Custom CNN architectures, and traditional machine learning pipelines.

Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

vTuanpham/Large_dataset_translator

Translate large dataset to any language with google translation api and multithreads processing, no key required!

Language: Python - Size: 146 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 72 - Forks: 23

ianjure/philnet-scraper

Scraper for collecting legitimate and phishing records used in PhiLNet's periodic retraining.

Language: Python - Size: 2.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

BUAADreamer/Chinese-LLaVA-Med

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

Language: Python - Size: 2.26 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 93 - Forks: 6

approximated-intelligence/embedding-distillation

Train a pooling Head for Dense Embeddings

Language: Python - Size: 105 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

nikitabugrovsky/hf-vector-pipeline

Build datasets for Hugging Face automatically

Language: Python - Size: 17.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Infinitode/CRSD

A synthetic customer review sentiment dataset for sentiment analysis generated using different AI models.

Size: 284 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

LLFELIPEVV/sistema-fake-news-ia

Sistema autónomo para la detección de noticias falsas en español usando inteligencia artificial, NLP y machine learning. Incluye análisis de datasets, entrenamiento de modelos tradicionales, profundos y Transformers, con API en FastAPI e interfaz web en React.

Size: 397 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

xieincz/huggingface-go

huggingface-go : 高速下载 huggingface 的模型和数据集

Language: Go - Size: 28.3 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 45 - Forks: 6

BUAADreamer/MLLM-Finetuning-Demo

使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory

Language: Python - Size: 61.5 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 49 - Forks: 2

BirkhoffG/jax-dataloader

Pytorch-like dataloaders for JAX.

Language: Jupyter Notebook - Size: 1.02 MB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 94 - Forks: 3

The-Data-Dilemma/ParquetToHuggingFace

ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scripts to generate and upload the data.

Language: Python - Size: 2.85 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 7 - Forks: 2

Express-Legal-Funding-LLC/express-legal-funding-reviews

As part of our commitment to transparency and innovation in legal technology, Express Legal Funding is proud to release our customer reviews dataset as an open resource for researchers, developers, and AI model trainers.

Size: 42 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SeekAI-786/Medi_Bot_CustomGPT

I and my team member has created this MediBot Fine Tune on Medical Question/Answer Dataset From Hugging Face

Language: Jupyter Notebook - Size: 7.78 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

Suyashkb/Customer-Support-Chatbot

This repo contains a fine-tuned version of DialoGPT (Conversational model of GPT-2) explicitly for customer support chatbots.

Language: Python - Size: 7.81 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

Samuela31/Sanskrit-Manuscripts-Revival-Using-Deep-Learning-Techniques

Restoring destroyed text in ancient Sanskrit manuscripts by predicting missing text using deep learning techniques. Mini project done in 3rd year of college using RoBERTa LLM, Tesseract OCR, and OpenCV.

Language: Jupyter Notebook - Size: 17.7 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

redis-performance/vector-embeddings

Complete pipeline for generating DBpedia text embeddings using OpenAI's embedding models and publishing them as Hugging Face datasets.

Language: Python - Size: 22.5 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

aspirant2018/conllu-pos-dataset

A minimal, pure Python interface that turns CoNLL-U format files into A huggingFace Dataset

Language: Python - Size: 7.81 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

shunk031/huggingface-datasets_cocoapi-tools

A helper library for easily converting MSCOCO format data using the loading script of huggingface datasets.

Language: Python - Size: 104 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

shunk031/cookiecutter-huggingface-datasets

cookiecutter for huggingface datasets

Language: Python - Size: 38.1 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 5 - Forks: 0

Md-Emon-Hasan/Fine-Tuning

End-to-end fine-tuning of Hugging Face models using LoRA, QLoRA, quantization, and PEFT techniques. Optimized for low-memory with efficient model deployment

Language: Jupyter Notebook - Size: 5.53 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

git-lfs-fuse/git-lfs-fuse

Mount remote repositories, models and datasets managed by Git LFS locally.

Language: Go - Size: 253 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 50 - Forks: 3

vishvaRam/Fine-Tune-Qwen2.5

This repository provides resources and instructions for fine-tuning the Qwen2.5-0.5B model. It includes scripts, tips, and best practices to adapt the model for specific tasks or domains. Designed for researchers and developers, it simplifies the fine-tuning process to achieve optimal performance and accuracy.

Language: Python - Size: 44.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

AadityaArunSingh/RoBERTa-Token-Classification-with-Additional-PLODv2-Data

This repo explores token classification for abbreviation and long-form detection using RoBERTa. We evaluate the impact of adding 50% of the PLODv2-filtered dataset, achieving improved F1 and recall. The repo includes methodology, evaluation using seqeval, and confusion matrix analysis.

Language: Python - Size: 11.7 KB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

daspartho/predict-subreddit

NLP model that predicts subreddit based on the title of a post

Language: Jupyter Notebook - Size: 816 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 32 - Forks: 6

SapienzaNLP/ita-bench

A collection of Italian benchmarks for LLM evaluation

Language: Python - Size: 731 KB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 30 - Forks: 1

balnarendrasapa/road-detection

This is a course project for DSCI-6011 - Deep Learning. deals with Drivable Area and lane segmentation for self driving cars

Language: Jupyter Notebook - Size: 181 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

ShivangamSoni/LLM-ATS-Comparative-Analysis

Thesis Work: Performance Evaluation of LLMs for Automatic Text Summarization

Language: Jupyter Notebook - Size: 14.8 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Arya920/Natural_Language_To_SQL_Queries

The task of this project is to Convert Natural Language to SQL Queries

Language: Jupyter Notebook - Size: 9.33 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

neverbiasu/hf-mirror-hub

一个从 Hugging Face 镜像站点快速下载模型和数据集的命令行工具。

Language: Python - Size: 20.5 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 0

xlang-ai/UnifiedSKG

[EMNLP 2022] Unifying and multi-tasking structured knowledge grounding with language models

Language: Python - Size: 21 MB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 557 - Forks: 60

Leftinant/tiny_qa_benchmark_pp

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

Size: 1.95 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Dhanush-R-git/MH-Analysis

The MHRoberta is Mental Health Roberta model. The pretrained Roberta transformer based model fine-tunned on Mental Health dataset by adopting PEFT method.

Language: Jupyter Notebook - Size: 3.67 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

QubitPi/wiktionary-data

Wiktionary data in simple parsable formats hosted on 🤗 Datasets

Language: Python - Size: 349 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Kyrgyz-Keyboard/data

Datasets & Data Preparation

Language: Python - Size: 148 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

onesuper/HuggingFace-Datasets-Text-Quality-Analysis

Retrieves parquet files from Hugging Face, identifies and quantifies junky data, duplication, contamination, and biased content in dataset using pandas

Language: Python - Size: 415 KB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 53 - Forks: 3

abhi9ab/DeepSeek-R1-Distill-Llama-8B-finance-v1

Finetuned Deepseek 8b model for finance reasoning

Language: Jupyter Notebook - Size: 29.3 KB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 13 - Forks: 1

mrcabbage972/simple-toolformer

A Python implementation of Toolformer using Huggingface Transformers

Language: Python - Size: 72.3 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 2

e-hossam96/arabic-nano-gpt

Arabic Nano GPT Trained on Arabic Wikipedia Dataset from Wikimedia

Language: Jupyter Notebook - Size: 1.65 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

yliuhz/hf-download

Language: Python - Size: 119 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

creative-graphic-design/huggingface-datasets_PKU-PosterLayout

PKU-PosterLayout for huggingface datasets

Language: Python - Size: 300 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

EJL3/Machine_Translation

Worked with research peers on Neuromatch academy. This is a Russian to English Deep learning NLP translation model

Language: Python - Size: 8.79 KB - Last synced at: 24 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

ShawonAshraf/bangla-math-chat

A math dataset for fine-tuning LLMs to chat on math problems in Bangla

Language: Python - Size: 50.8 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

shunk031/huggingface-datasets_JGLUE

JGLUE: Japanese General Language Understanding Evaluation for huggingface datasets

Language: Python - Size: 464 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 12 - Forks: 3

abhi9ab/DeepSeek-R1-Distill-Qwen-1.5B-finance-v1

Finetuned Deepseek 1.5b model for finance reasoning

Language: Jupyter Notebook - Size: 33.2 KB - Last synced at: 5 months ago - Pushed at: 10 months ago - Stars: 3 - Forks: 0

TirendazAcademy/Hugging-Face-Tutorials

Getting started with Hugging Face

Language: Jupyter Notebook - Size: 12.3 MB - Last synced at: 7 months ago - Pushed at: 8 months ago - Stars: 37 - Forks: 10

AmirAli5/HuggingFace

This repository chronicles my exploration of Hugging Face, covering tasks like model training, fine-tuning, and deployment across various applications such as NLP, text summarization, text-to-image, and text-to-audio

Language: Jupyter Notebook - Size: 3.45 MB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

acrion/ditana-assistant

Ditana Assistant: AI-powered CLI/GUI tool for intelligent assistance, leveraging LLMs with OS interaction capabilities and context augmentation, optionally via Wolfram|Alpha

Language: Python - Size: 834 KB - Last synced at: 7 months ago - Pushed at: 8 months ago - Stars: 10 - Forks: 0

bhag41/mental_health_chatbot

Mental Health Chatbot using OpenAI api, dataset from Hugging Face

Language: Python - Size: 1.6 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

proflead/hugging-face-tutorial

The Ultimate Hugging Face Guide: From Beginner to Pro

Size: 11.7 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Gaurans/promptwright

Promptwright transforms natural-language user prompts into automated browser workflows using AI, while instantly generating reusable Playwright/Cypress/Selenium scripts.

Language: Python - Size: 1000 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

bot08/aiua-20k

Size: 17.6 KB - Last synced at: 6 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

akhilk2802/BigDataSystems

MlOps and data pipelines

Language: Python - Size: 10.1 MB - Last synced at: 5 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

developer0hye/hugging-face-image-ocr-dataset-upload-example

One of the Hugging Face Image Dataset Upload Guides

Language: Python - Size: 635 MB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

adamelkholyy/whisper-yt

Toolkit for using Whisper to transcribe YouTube videos. Includes Whisper transcription of YouTube videos, conversion of YouTube video into HuggingFace dataset (using audio and subtitles) and evaluation of Whisper transcription against YouTube subtitles

Language: Python - Size: 79.1 KB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

Jpzinn654/qa-portuguese-v1

This is a split 500 thousands rows of a dataset from hugging face in portuguese to train NLP's for Question-and-Answering

Language: Python - Size: 4.88 KB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 3 - Forks: 0

npuichigo/tarzan

High-level API for tar-based dataset

Language: Python - Size: 27.3 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 11 - Forks: 1

dhanushpittala11/SummarizerText_Hf_End2End_1

This is a Text Summarization web application using Huggingface models finetuned on a custom dataset. This project focuses on building an end-to-end pipeline for data ingestion, data transformation, model training ,model evaluation, prediction and API integration, hosting it on the web.

Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 8 months ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0