GitHub topics: hf-datasets
sayakpaul/count-tokens-hf-datasets
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
Language: Python - Size: 19.5 KB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 1

mariosasko/datasets_sql
Execute arbitrary SQL queries on 🤗 Datasets
Language: Python - Size: 37.1 KB - Last synced at: 18 days ago - Pushed at: over 1 year ago - Stars: 32 - Forks: 2
