inference-efficiency

Here are 6 public repositories matching this topic...

Theia-4869 / FasterVLM

Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.

inference-efficiency vision-language-model visual-token-pruning training-free-acceleration

Updated Jun 29, 2025
Python

microsoft / Moonlit

Star

This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.

model-compression neural-architecture-search inference-efficiency token-pruning

Updated Oct 25, 2024
Python

HVision-NKU / GlimpsePrune

Star

Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"

inference-efficiency lvlms mllms visual-token-pruning token-compression

Updated Aug 25, 2025
Python

Theia-4869 / CDPruner

Star

Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.

inference-efficiency vision-language-model visual-token-pruning training-free-acceleration

Updated Jul 1, 2025
Python

Theia-4869 / VisPruner

Star

[ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

inference-efficiency vision-language-model visual-token-pruning training-free-acceleration

Updated Jul 1, 2025
Python

HaoranREN / TensorFlow_Model_Quantization

Star

A tutorial of model quantization using TensorFlow

machine-learning tensorflow tensorflow-lite tflite model-quantization inference-efficiency quantization-aware-training

Updated Aug 2, 2021
Python

Improve this page

Add a description, image, and links to the inference-efficiency topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference-efficiency topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference-efficiency

Here are 6 public repositories matching this topic...

Theia-4869 / FasterVLM

microsoft / Moonlit

HVision-NKU / GlimpsePrune

Theia-4869 / CDPruner

Theia-4869 / VisPruner

HaoranREN / TensorFlow_Model_Quantization

Improve this page

Add this topic to your repo