GitHub topics: awq

Repositories

intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language: Python - Size: 468 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,487 - Forks: 281

Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Transformers, and vLLM. Export your models effortlessly to autogptq, autoawq, gguf and autoround formats with high accuracy even at extremely low bit precision.

Language: Python - Size: 12.1 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 611 - Forks: 52

harleyszhang/harleyszhang.github.io Fork of tw93/tw93.github.io

🧗‍♂️ harleyszhang 的个人博客

Language: HTML - Size: 482 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 5 - Forks: 0

ModelTC/LightCompress

A powerful toolkit for compressing large models including LLM, VLM, and video generation models.

Language: Python - Size: 30.5 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 547 - Forks: 61

hcd233/Aris-AI-Model-Server

An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

Language: Python - Size: 1.11 MB - Last synced at: 16 days ago - Pushed at: 17 days ago - Stars: 16 - Forks: 1

lpalbou/model-quantizer

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

Language: Python - Size: 165 KB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

RajVenkat20/LLM-Optimizations-QLoRA-AWQ

This project takes the Flan-T5 LLM and applies QLoRA and AWQ quantization techniques

Language: Python - Size: 488 KB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

FireStrike1010/artificial_personality

Artificial Personality is text2text AI chatbot that can use character cards

Language: Python - Size: 72.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

GURPREETKAURJETHRA/Quantize-LLM-using-AWQ

Quantize LLM using AWQ

Language: Jupyter Notebook - Size: 49.8 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 2

vpgits/sdgp-ml

This repository contains notebooks and resources related to the Software Development Group Project (SDGP) machine learning component. Specifically, it includes two notebooks used for creating a dataset and fine-tuning a Mistral-7B-v0.1-Instruct model.

Language: Jupyter Notebook - Size: 384 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

glurp/rfilter

programmable filter, as posix awq, with ruby syntaxe and embeddable function

Language: Ruby - Size: 41 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Related Keywords

awq 11 gptq 4 llm 4 quantization 4 transformers 3 large-language-models 3 vllm 2 int4 2 qlora 2 llm-inference 2 machine-learning 2 pruning 2 ai 2 smoothquant 2 pytorch 2 python 1 flan-t5 1 huggingface-transformers 1 optimization 1 nlp 1 model-compression 1 inference 1 huggingface 1 cross-platform 1 cpu-compatible 1 bitsandbytes 1 kaggle 1 python3 1 chatbot 1 chatbot-framework 1 neural-networks 1 tavernai 1 generative-ai 1 llm-training 1 llms 1 quantize 1 autoawq 1 peft 1 bash 1 filter 1 plotting 1 ruby 1 auto-tuning 1 fp4 1 int8 1 knowledge-distillation 1 low-precision 1 mxformat 1 post-training-quantization 1 quantization-aware-training 1 sparsegpt 1 sparsity 1 neural-compressor 1 rounding 1 blog 1 benchmark 1 deepseek-v3 1 deployment 1 evaluation 1 internlm2 1 mixtral 1 token-merging 1 token-pruning 1 token-reduction 1 tool 1 wan 1 embedding 1 fastapi 1 mlx 1 openai-compatible-api 1 rag 1 reranker 1 sentence-transformers 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos