Lets start with Q4_K_M. An example of it is mistralai/mistral-7b-instruct-v0.3
Lets go through each parameters,
Q4
Means 4-bit quantization.
Original models are usually 16-bit or 32-bit precision.
4-bit compresses them heavily โ much smaller memory usage.
K
Means it uses K-quantization (grouped quantization).
This is an improved method used such that
- Preserves better accuracy
- Keeps performance stable
- Reduces memory efficiently
M
Means Medium variant of K-quantization.
There are usually:
Q4_K_Sโ Small (slightly smaller, slightly lower quality)Q4_K_Mโ Medium (better balance)Q4_K_Lโ Large (better quality, more RAM)
How we can recommend a model for local run ?
| Quant | RAM Usage | Quality | Recommended? |
|---|---|---|---|
| Q2 | Very Low | Low | โ No |
| Q4_K_S | Low | Good | OK |
| Q4_K_M | Moderate | Very Good | โ YES |
| Q8_0 | High | Excellent | Only if 32GB+ RAM |
If you have:
- 16GB RAM โ use Q4_K_M
- 32GB RAM โ Q4_K_M or Q8_0
- Less than 16GB โ try Q4_K_S
Now I am trying to load the model in LM studio on my local laptop.