An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: learned-tokenization

lucidrains/MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

Language: Python - Size: 34.5 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 643 - Forks: 55

lucidrains/rvq-vae-gpt

My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation

Language: Python - Size: 34.1 MB - Last synced at: about 17 hours ago - Pushed at: 7 months ago - Stars: 87 - Forks: 1