Decoding how to select a language models for local setup
Lets start with Q4_K_M. An example of it is mistralai/mistral-7b-instruct-v0.3 Lets go through each parameters, Q4 Means 4-bit quantization. Original models are usually 16-bit or 32-bit precision.4-bit compresses them heavily → much smaller memory usage. K Means it uses K-quantization (grouped quantization). This is an improved method used such that M Means Medium variant of […]