Developing a Multi-Agent Framework for Multimodal Multi-Task Learning

This project is focused on enhancing the capabilities of large multimodal models. Multimodal learning is an area of machine learning where models are designed to process and correlate information from various input modalities, such as text, images, and audio. In this project, we are developing a multi-agent framework where each agent is specialized in understanding a specific modality and task. These agents work in tandem, the framework incorporates specific agents for the tasks they are specialized in dynamically, enabling the system to handle multiple tasks simultaneously. By integrating these multi-agent based ideas into large multi-modal models, our project aims to significantly improve performance in multi-task learning and generalization to new tasks.

Related publications:

  1. Large Multimodal Agents: A Survey
    Xie, J., Chen, Z., Zhang, R., Wan, X., & Li, G. (2024). Large Multimodal Agents: A Survey. arXiv:2402.15116. https://doi.org/10.48550/arXiv.2402.15116 
  2. AgentLite: ALightweightLibraryforBuildingandAdvancing Task-Oriented LLM Agent System
    Liu, Z., Yao, W., Zhang, J., Yang, L., Liu, Z., Tan, J., Choubey, P. K., Lan, T., Wu, J., Wang, H., Heinecke, S., Xiong, C., & Savarese, S. (2024). AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System. arXiv:2402.15538. https://doi.org/10.48550/arXiv.2402.155381
  3. MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion
    Li, S., Wang, R., Hsieh, C.-J., Cheng, M., & Zhou, T. (2024). MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion. arXiv:2402.12741. https://doi.org/10.48550/arXiv.2402.12741

CSC 490/CSE466 Human Computer Interaction

Five Papers Accepted in EMBC 2024

Undergraduate Project Update Presentation Day

CCDS Undergrad Project Update Presentation Day, held on February 8, 2024. Eight groups, under the supervision of CCDS mentors, showcased their progress and findings. The presentations encompassed a diverse range of topics and research endeavors. It was a culmination of dedicated efforts and collaborative work within the CCDS community. The event provided a platform for students to share their achievements and insights with peers and faculty members alike.

Boundary-Enhanced Attention for Satellite Imagery dataScience

Satellite image classification presents unique challenges distinct from traditional urban scene datasets, including significant class imbalance and the scarcity of comprehensive examples within single frames. While recent advancements in semantic segmentation and metric learning have shown promise in urban scene datasets, their direct applicability to satellite image classification remains uncertain. This paper introduces a novel approach, the Boundary Attention (BA) Loss , specifically designed to address these challenges in satellite imagery. BA emphasizes the significance of boundary regions within satellite imagery, aiming to mitigate information relation complexity by directing enhanced attention to minority classes and improving attention mechanisms along class boundaries. Through comprehensive experimental evaluation and comparison with existing methods, this paper demonstrates the effectiveness and adaptability of the BA method, paving the way for more accurate and robust satellite image classification systems. The proposed BA method offers a tailored solution that stands to significantly improve classification accuracy in the context of satellite image analysis.

Non-Rigid Distortion Removal via Coordinate Based Image Representation

maging through turbulent refractive medium (e.g., hot air, in-homogeneous gas, fluid flow) is challenging, since the non-linear light transport through the medium (e.g. refraction and scattering) causes non-rigid distortions in perceived images. However, most computer vision algorithms rely on sharp and distortion-free images to achieve the expected performance. Removal of these non-rigid image distortions is therefore critical and beneficial for many vision applications, from segmentation to recognition. To resolve the distortion and blur introduced by air turbulence, conventional turbulence restoration methods leverage optical flow, regions fusion and blind deconvolution to recover images. One avenue that is underexplored for this problem is the use of coordinate based image representations. These methods represent images as the parameters of a neural network ,and they can be used to deform the image grid itself to account for turbulence. In this research, we aim to extend this idea to unseen images with meta learning that can remove both air and water distortions without much customization.

Related publications:

  1. Unsupervised Non-Rigid Image Distortion Removal via Grid Deformation, ICCV 2021

Adaptive LLM-based Tutor for Personalized Python Learning

Because of their varied backgrounds and skill levels, students in the field of programming education frequently confront a variety of difficulties. Personalized learning is typically not supported by traditional learning platforms, which reduces their efficacy. Our goal is to construct an intelligent tutor system based on LLMs that can solve problems and reason in order to provide students with tutor-like guidance. Additionally, we want to establish engaging interactions between students and tutors and during these exchanges, we would like to learn as much as possible about the tutors’ internal decision-making process. Furthermore, in order to deliver a more approachable and natural experience that is in line with the learner’s needs and the curriculum objectives, the system will need to recognize and monitor, as much as possible, the individual preferences and mental state of the learners.