GitHub topics: awq
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language: Python - Size: 468 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,487 - Forks: 281

intel/auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Transformers, and vLLM. Export your models effortlessly to autogptq, autoawq, gguf and autoround formats with high accuracy even at extremely low bit precision.
Language: Python - Size: 12.1 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 611 - Forks: 52

harleyszhang/harleyszhang.github.io Fork of tw93/tw93.github.io
🧗♂️ harleyszhang 的个人博客
Language: HTML - Size: 482 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 5 - Forks: 0

ModelTC/LightCompress
A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
Language: Python - Size: 30.5 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 547 - Forks: 61

hcd233/Aris-AI-Model-Server
An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API
Language: Python - Size: 1.11 MB - Last synced at: 16 days ago - Pushed at: 17 days ago - Stars: 16 - Forks: 1

lpalbou/model-quantizer
Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.
Language: Python - Size: 165 KB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

RajVenkat20/LLM-Optimizations-QLoRA-AWQ
This project takes the Flan-T5 LLM and applies QLoRA and AWQ quantization techniques
Language: Python - Size: 488 KB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

FireStrike1010/artificial_personality
Artificial Personality is text2text AI chatbot that can use character cards
Language: Python - Size: 72.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

GURPREETKAURJETHRA/Quantize-LLM-using-AWQ
Quantize LLM using AWQ
Language: Jupyter Notebook - Size: 49.8 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 2

vpgits/sdgp-ml
This repository contains notebooks and resources related to the Software Development Group Project (SDGP) machine learning component. Specifically, it includes two notebooks used for creating a dataset and fine-tuning a Mistral-7B-v0.1-Instruct model.
Language: Jupyter Notebook - Size: 384 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

glurp/rfilter
programmable filter, as posix awq, with ruby syntaxe and embeddable function
Language: Ruby - Size: 41 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0
