Topic: "smooth-quantization"
aahouzi/llama2-chatbot-cpu
A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
Language: Python - Size: 30.3 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 0
