Tokenization is a Compression Codec Nobody Uses That WayVector databases compress text payloads with generic codecs like LZ4, but not with token-aware schemes. BPE tokenization + entropy coding gives you 5x lossless compression using infrastructure already in every ML stack.Published onMay 12, 2026