Developing a Multi-Agent Framework for Multimodal Multi-Task Learning

Email: ccds@iub.edu.bd
Phone: +88 01885 570 597

This project is focused on enhancing the capabilities of large multimodal models. Multimodal learning is an area of machine learning where models are designed to process and correlate information from various input modalities, such as text, images, and audio. In this project, we are developing a multi-agent framework where each agent is specialized in understanding a specific modality and task. These agents work in tandem, the framework incorporates specific agents for the tasks they are specialized in dynamically, enabling the system to handle multiple tasks simultaneously. By integrating these multi-agent based ideas into large multi-modal models, our project aims to significantly improve performance in multi-task learning and generalization to new tasks.

Related publications:

Large Multimodal Agents: A Survey
Xie, J., Chen, Z., Zhang, R., Wan, X., & Li, G. (2024). Large Multimodal Agents: A Survey. arXiv:2402.15116. https://doi.org/10.48550/arXiv.2402.15116
AgentLite: ALightweightLibraryforBuildingandAdvancing Task-Oriented LLM Agent System
Liu, Z., Yao, W., Zhang, J., Yang, L., Liu, Z., Tan, J., Choubey, P. K., Lan, T., Wu, J., Wang, H., Heinecke, S., Xiong, C., & Savarese, S. (2024). AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System. arXiv:2402.15538. https://doi.org/10.48550/arXiv.2402.155381
MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion
Li, S., Wang, R., Hsieh, C.-J., Cheng, M., & Zhou, T. (2024). MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion. arXiv:2402.12741. https://doi.org/10.48550/arXiv.2402.12741

Developing a Multi-Agent Framework for Multimodal Multi-Task Learning

Related

CONTACT

Follow us on Facebook

Latest News