Behind Maya: Building a Multilingual Vision Language Model

Nahid Alam; Karthik reddy Kanjula; Surya Guthikonda; Timothy Chung; Bala Krishna S Vegesna; Abhipsha Das; Anthony Susevski; Ryan Sze-Yin Chan; S M Iftekhar Uddin; Shayekh Bin Islam; Roshan Santhosh; Snegha A; Drishti Sharma; Chen Cecilia Liu; Isha Chaturvedi; Genta Indra Winata; Ashvanth.S; Snehanshu Mukherjee; Alham Fikri Aji

Behind Maya: Building a Multilingual Vision Language Model

Nahid Alam, Karthik reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Cecilia Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth.S, Snehanshu Mukherjee, Alham Fikri Aji

Published: 06 May 2025, Last Modified: 29 May 2025VLMs4All 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multilingual, VLM

TL;DR: Multilingual Vision-Language model for Question Answering in 10 languages

Abstract: In recent times, we have seen a rapid development of large Vision-Language Models (VLMs). They have shown impressive results on academic benchmarks, primarily in widely spoken languages but lack performance on low-resource languages and varied cultural contexts. To address these limitations, we introduce Maya, an open-source Multilingual VLM. Our contributions are: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; and 2) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available at https://github.com/nahidalam/maya.

Submission Number: 1

Loading