GitHub topics: learned-tokenization
lucidrains/MEGABYTE-pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
Language: Python - Size: 34.5 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 643 - Forks: 55

lucidrains/rvq-vae-gpt
My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation
Language: Python - Size: 34.1 MB - Last synced at: about 17 hours ago - Pushed at: 7 months ago - Stars: 87 - Forks: 1
