An open API service providing repository metadata for many open source software ecosystems.

GitHub / IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IST-DASLab%2Fmarlin

Stars: 661
Forks: 52
Open issues: 29

License: apache-2.0
Language: Python
Size: 708 KB
Dependencies parsed at: Pending

Created at: over 1 year ago
Updated at: 4 months ago
Pushed at: 8 months ago
Last synced at: 4 months ago

Topics: 4bit, kernel, llm, quantization

    Loading...