Quantifying Interpretability in CLIP Models with Concept Consistency

Avinash Madasu; Vasudev Lal; Phillip Howard

Quantifying Interpretability in CLIP Models with Concept Consistency

Avinash Madasu, Vasudev Lal, Phillip Howard

Published: 07 May 2025, Last Modified: 29 May 2025VisCon 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: interpretability, visual concepts, CLIP models

Abstract: CLIP is a widely used foundational model for vision-language tasks, yet its internal mechanisms remain poorly understood. To address this we introduce Concept Consistency Score (CCS), a new interpretability metric that quantifies how strongly individual attention heads align with coherent visual concepts. Using in-context learning with ChatGPT and an LLM-as-a-judge framework, we assign and validate concept labels across six CLIP models of varying sizes, data types, and patch sizes. Our experiments show that high CCS heads are crucial for maintaining model performance, especially in out-of-domain detection, concept reasoning, and video-language tasks. These findings highlight CCS as an effective tool for interpreting and analyzing CLIP-like models.

Submission Number: 23

Loading