GitHub topics: hf-datasets
sayakpaul/count-tokens-hf-datasets
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
Language: Python - Size: 19.5 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 1

mariosasko/datasets_sql
Execute arbitrary SQL queries on 🤗 Datasets
Language: Python - Size: 37.1 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 32 - Forks: 2
