GGUF and AWQ Quantization Scripts
- Includes pushing model files to repo
Purchase here: https://buy.stripe.com/5kA6paaO9dmbcV...
ADVANCED Fine-tuning Repository Access
1. Quantization Scripts
2. Unsupervised + Supervised Fine-tuning Notebooks
3. Q&A Dataset Preparation + Cleaning Scripts
4. Scripts to create and use Embeddings
Learn More: https://trelis.com/advanced-fine-tuni...
Resources:
- Presentation Slides: https://tinyurl.com/2s58xnam
- Llama.cpp: https://github.com/ggerganov/llama.cpp
- AutoAWQ: https://github.com/casper-hansen/Auto...
- Runpod Affiliate Link: (supports Trelis) https://tinyurl.com/yjxbdc9w
- AWQ paper: https://arxiv.org/pdf/2306.00978.pdf
- GPTQ paper: https://arxiv.org/pdf/2210.17323.pdf
- Ready-Quantized Models: https://huggingface.co/TheBloke/
Referenced Videos:
- Supervised Fine-tuning (with bitsandbytes): • Embeddings vs Fine Tuning - Part 2, ...
- Tiny Llama (run a GGUF model on your laptop): • Tiny Llama
- AWQ API setup and explanation: • Double Inference Speed with AWQ Quant...
0:00 How to quantize a large language model
0:38: Why quantize a language model
1:30 What is quantization
2:23 Which quantization to use?
3:29 GGUF vs BNB vs AWQ vs GPTQ
10:01 How to quantize with AWQ
18:48 How to quantize with GGUF (GGML)
25:29 Recap