Uniform 4-bit quantization costs an embedding model real retrieval quality. Quantizing only the layers that are not doing real work costs nothing — and a layer-by-layer look at the model tells you which layers those are. Validated on mxbai-embed-large + BEIR scifact.
- Published on