A friend said they expand SPLADE terms into an OpenSearch BM25 text field and it works. We tested it — 0.36 NDCG@10, worse than plain BM25. Here is why, and how to mostly fix it.
Inference-free SPLADE nearly matches full SPLADE quality at 13× lower query latency, no GPU needed at query time. Benchmarked against full SPLADE and BM25 on Qdrant and pyserini.
Vector databases compress text payloads with generic codecs like LZ4, but not with token-aware schemes. BPE tokenization + static entropy coding gives you ~3x lossless compression — 2× better than gzip — using infrastructure already in every ML stack.