All Posts

  • A friend said they expand SPLADE terms into an OpenSearch BM25 text field and it works. We tested it — 0.36 NDCG@10, worse than plain BM25. Here is why, and how to mostly fix it.
    Published on
  • Vector databases compress text payloads with generic codecs like LZ4, but not with token-aware schemes. BPE tokenization + static entropy coding gives you ~3x lossless compression — 2× better than gzip — using infrastructure already in every ML stack.
    Published on
  • Search is everywhere and is one of the hardest problems in CS. Here's why I'm obsessed with it.
    Published on
  • Reflections on a year of growth, travel, and pursuing mastery
    Published on