2025 05 28

[Preprint] We introduce HoliTom, a training-free holistic token merge method for fast video LLMs, which accelerates video LLMs inference without compromising performance, achieving 99.1% performance retention while reducing FLOPs to just 7%. The code is open-source and ready for use.