Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
-
Updated
Jun 29, 2025 - Python
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.
Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"
Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.
[ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
A tutorial of model quantization using TensorFlow
Add a description, image, and links to the inference-efficiency topic page so that developers can more easily learn about it.
To associate your repository with the inference-efficiency topic, visit your repo's landing page and select "manage topics."