A friend said they expand SPLADE terms into an OpenSearch BM25 text field and it works. We tested it — 0.36 NDCG@10, worse than plain BM25. Here is why, and how to mostly fix it.
Inference-free SPLADE nearly matches full SPLADE quality at 13× lower query latency, no GPU needed at query time. Benchmarked against full SPLADE and BM25 on Qdrant and pyserini.
Vector databases compress text payloads with LZ4 by default. Tokenize the text first and entropy-code the token IDs, and you get ~3x lossless compression, 2.6x what LZ4 manages, using pieces every ML stack already ships.