An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: hf-datasets

sayakpaul/count-tokens-hf-datasets

This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.

Language: Python - Size: 19.5 KB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 1

mariosasko/datasets_sql

Execute arbitrary SQL queries on 🤗 Datasets

Language: Python - Size: 37.1 KB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 32 - Forks: 2