{"id":661,"date":"2025-07-18T19:05:30","date_gmt":"2025-07-18T19:05:30","guid":{"rendered":"https:\/\/ccds.ai\/?p=661"},"modified":"2025-08-10T18:15:07","modified_gmt":"2025-08-10T18:15:07","slug":"developing-a-multi-agent-framework-for-multimodal-multi-task-learning","status":"publish","type":"post","link":"https:\/\/ccds.ai\/?p=661","title":{"rendered":"Developing a Multi-Agent Framework for Multimodal Multi-Task Learning"},"content":{"rendered":"<div id='av_section_1'  class='avia-section av-av_section-4626b8e4cec458b6915ec5d17cf7764f main_color avia-section-default avia-no-border-styling  avia-builder-el-0  avia-builder-el-no-sibling  avia-bg-style-scroll container_wrap fullsize'  ><div class='container av-section-cont-open' ><main  role=\"main\" itemprop=\"mainContentOfPage\"  class='template-page content  av-content-full alpha units'><div class='post-entry post-entry-type-page post-entry-661'><div class='entry-content-wrapper clearfix'>\n\n<style type=\"text\/css\" data-created_by=\"avia_inline_auto\" id=\"style-css-av-md96vkwv-05a3561dce2541a6e1cb90b50a795e8f\">\n.avia-image-container.av-md96vkwv-05a3561dce2541a6e1cb90b50a795e8f img.avia_image{\nbox-shadow:none;\n}\n.avia-image-container.av-md96vkwv-05a3561dce2541a6e1cb90b50a795e8f .av-image-caption-overlay-center{\ncolor:#ffffff;\n}\n<\/style>\n<div  class='avia-image-container av-md96vkwv-05a3561dce2541a6e1cb90b50a795e8f av-styling- avia-align-center  avia-builder-el-1  el_before_av_textblock  avia-builder-el-first '   itemprop=\"image\" itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/ImageObject\" ><div class=\"avia-image-container-inner\"><div class=\"avia-image-overlay-wrap\"><img fetchpriority=\"high\" fetchpriority=\"high\" decoding=\"async\" class='wp-image-662 avia-img-lazy-loading-not-662 avia_image ' src=\"https:\/\/ccds.ai\/wp-content\/uploads\/2025\/07\/unnamed-1.png\" alt='' title='unnamed-1'  height=\"373\" width=\"512\"  itemprop=\"thumbnailUrl\" srcset=\"https:\/\/ccds.ai\/wp-content\/uploads\/2025\/07\/unnamed-1.png 512w, https:\/\/ccds.ai\/wp-content\/uploads\/2025\/07\/unnamed-1-300x219.png 300w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/div><\/div><\/div>\n<section  class='av_textblock_section av-md96uhjz-b49da96b80b931f7e40b72e761bb3d35'  itemscope=\"itemscope\" itemtype=\"https:\/\/schema.org\/BlogPosting\" itemprop=\"blogPost\" ><div class='avia_textblock'  itemprop=\"text\" ><p>This project is focused on enhancing the capabilities of large multimodal models. Multimodal learning is an area of machine learning where models are designed to process and correlate information from various input modalities, such as text, images, and audio. In this project, we are developing a multi-agent framework where each agent is specialized in understanding a specific modality and task. These agents work in tandem, the framework incorporates specific agents for the tasks they are specialized in dynamically, enabling the system to handle multiple tasks simultaneously. By integrating these multi-agent based ideas into large multi-modal models, our project aims to significantly improve performance in multi-task learning and generalization to new tasks.<\/p>\n<p><b>Related publications:<\/b><\/p>\n<ol>\n<li><a href=\"https:\/\/web.archive.org\/web\/20250125021224\/https:\/\/arxiv.org\/abs\/2402.15116\" target=\"_blank\" rel=\"noopener\"><b>Large Multimodal Agents: A Survey<\/b><\/a><i><br \/>\n<\/i><i>Xie, J., Chen, Z., Zhang, R., Wan, X., &amp; Li, G. (2024). Large Multimodal Agents: A Survey. arXiv:2402.15116.\u00a0<\/i><a href=\"https:\/\/web.archive.org\/web\/20250125021224\/https:\/\/doi.org\/10.48550\/arXiv.2402.15116\" target=\"_blank\" rel=\"noopener\"><i>https:\/\/doi.org\/10.48550\/arXiv.2402.15116<\/i><\/a><b>\u00a0<\/b><\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20250125021224\/https:\/\/arxiv.org\/pdf\/2402.15538\" target=\"_blank\" rel=\"noopener\"><b>AgentLite: ALightweightLibraryforBuildingandAdvancing Task-Oriented LLM Agent System<\/b><b><br \/>\n<\/b><\/a><i>Liu, Z., Yao, W., Zhang, J., Yang, L., Liu, Z., Tan, J., Choubey, P. K., Lan, T., Wu, J., Wang, H., Heinecke, S., Xiong, C., &amp; Savarese, S. (2024). AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System. arXiv:2402.15538. https:\/\/doi.org\/10.48550\/arXiv.2402.155381<\/i><\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20250125021224\/https:\/\/arxiv.org\/pdf\/2402.12741\" target=\"_blank\" rel=\"noopener\"><b>MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion<\/b><b><br \/>\n<\/b><\/a><i>Li, S., Wang, R., Hsieh, C.-J., Cheng, M., &amp; Zhou, T. (2024). MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion. arXiv:2402.12741. https:\/\/doi.org\/10.48550\/arXiv.2402.12741<\/i><\/li>\n<\/ol>\n<\/div><\/section>\n\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":2,"featured_media":662,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[88],"tags":[],"class_list":["post-661","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai_ml_projects"],"acf":[],"jetpack_featured_media_url":"https:\/\/ccds.ai\/wp-content\/uploads\/2025\/07\/unnamed-1.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/ccds.ai\/index.php?rest_route=\/wp\/v2\/posts\/661","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ccds.ai\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ccds.ai\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ccds.ai\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ccds.ai\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=661"}],"version-history":[{"count":2,"href":"https:\/\/ccds.ai\/index.php?rest_route=\/wp\/v2\/posts\/661\/revisions"}],"predecessor-version":[{"id":664,"href":"https:\/\/ccds.ai\/index.php?rest_route=\/wp\/v2\/posts\/661\/revisions\/664"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ccds.ai\/index.php?rest_route=\/wp\/v2\/media\/662"}],"wp:attachment":[{"href":"https:\/\/ccds.ai\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=661"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ccds.ai\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=661"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ccds.ai\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=661"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}