Congratulations to Muhtasim Ibteda and Ashfaq for completing their senior project.

They developed PACE a Python AI companion for Enhanced Engagement. For this work they generated synthetic data from GPT 3.5 turbo for scaffolding and conversion fine-tuning. The LORA fine tuned Gemma 2B model was used for making the system relatively lightweight. This trains the LLM model to breakdown complex problems into subproblems and generate hints and structured steps for the students. On the other hand the conversation data allows the LLM to engage with users using natural, human-like dialogue, to avoid hallucinations, to supports error correction and detailed feedback, and to enhances motivation and interest through interaction with users with different learning styles, and pace. Evaluation of the system was also performed.

A wider evaluation of the system is underway and we plan to make the version 2 of the system available to our intro to python programming students. Interaction datasets collected from students (with their consent of course) will be valuable for making such a system more reliable.

We hope to see more exciting work from Ibteda and Ashfaq in the near future.

Congratulations to Farzana Islam, Sumaya, Md. Fahad Monir, and Dr Ashraful Islam for getting their paper accepted in Data in Brief, Elsevier.

The paper presents FabricSpotDefect dataset which is an annotated dataset for identifying spot defects in different fabric types.

Here is a short description of the paper:

The FabricSpotDefect dataset is, to the best of our knowledge, the first dataset specifically designed to accurately challenge computer vision in detecting fabric spots. There are a total of 1,014 raw images and manually annotated 3,288 different categories of spots. This dataset expands to 2,300 augmented images after applying six categories of augmentation techniques like flipping, rotating, shearing, saturation adjustment, brightness adjustment, and noise addition. We manually conducted annotations on original images to provide real-world essence rather than augmented images. Two versions are considered for augmented images, one is YOLOv8 resulting in 7,641 annotations and another one is COCO format resulting in 7,635 annotations. This dataset consists of various types of fabrics such as cotton, linen, silk, denim, patterned textiles, jacquard fabrics, and so on, and spots like stains, discolorations, oil marks, rust, blood marks, and so on. These kinds of spots are quite difficult to detect manually or using traditional methods. The images were snapped in home lights, using basic everyday clothes, and in normal conditions, making this FabricSpotDefect dataset established in real-world applications.

The figure below shows different spot samples with annotated bounding boxes and polygon annotation in red color 109 where (a) ink stain (b) paint spot (c) marker spot (d) makeup stain (e) rust stain (f) glue spot 110 (g)detergent stain (h) oil stain (i) coffee stain (j) food spot (k) blood spot, and (l) sweat stain.

link to download the dataset will be shared soon.

A paper has been accepted for publication in the Journal of the Asia Pacific Economy

A paper titled “Capturing the spatiotemporal inequality in electricity consumption at the subnational level of Bangladesh using Nighttime Lights” has been accepted for publication in the Journal of the Asia Pacific Economy (SJR Q2, H-index 38, (Scopus) CiteScore 3.7 in 2023).

The research work was led by Dr Amin Masud Ali, Professor, Dept of Economics, JU and a co-director / supervisor of Data Science wing, CCDS. The paper is co-authored by Muntasir Wahed (then RA of DnDLab and Data Science wing, currently PhD student at UIUC), Dr Amin Ahsan Ali (Dept of CSE, IUB, and Director, AI & ML Wing, CCDS), and Dr Moinul I Zaber (Dept of CSE, DU and a collaborator of the Data Science wing, CCDS).

This paper examines the spatiotemporal inequality in electricity consumption at the subnational level (Zila and Upazila/Thana) of Bangladesh using nighttime light (NTL) data. The NTL data, sourced from the Visible Infrared Imaging Radiometer Suite (VIIRS) day/night band (DNB) for the period from 2013 to 2020, reveals persistent variability in electricity consumption among the districts. Notably, the gap between urban and non-urban areas has widened. While within district inequality (measured by NTL Gini) has declined over time, it remains high in several districts. Convergence analysis confirms that while lagging districts are showing a catching up effect, the sub-districts are diverging among themselves (in terms of mean NTL per capita). Interestingly, the rural sub-districts are converging among themselves despite urban sub-district divergence. The study also identifies regions with significant imbalance between NTL, population, and built-up area density values.

These findings have implications for policymakers aiming to ensure electricity for all and reduce inequality. First, these findings provide a clear picture of the NTL inequality pattern at the subnational level of the country. The findings should contribute to the process of ensuring electricity for all (SDG, Goal 7) producing and monitoring the evolution of inequalities within the country (SDG, Goal 10) to achieve the sustainable development goal of reducing inequalities. Secondly, this investigation also captures the inequality in regional economic development as NTL is a recognized proxy indicator of poverty, public service coverage, and economic activity.

Two papers by CCDS Senior RAs have been accepted at the prestigious IEEE 23rd International Conference on Machine Learning and Applications (ICMLA), USA!

🎉 Huge Congratulations to Our Senior RAs! 🎉

We are thrilled to announce that two papers by CCDS Senior RAs Nabarun Halder, Jahanggir Hossain Setu, Tanjina Piash Proma, and Syed Tangim Pasha have been accepted at the prestigious IEEE 23rd International Conference on Machine Learning and Applications (ICMLA), USA! 🎓🇺🇸

Accepted Titles:

“ECGInsight: A Web Application-Based Approach to Myocardial Infarction Detection From ECG Image Reports Utilizing ResNet”

and

“Using Transformers for Emotion Recognition in Bangla Text: A Comparative Study of MultiBERT and BanglaBERT with Data Augmentation”

With an impressive acceptance rate of just 24.3% this year, this is an excellent achievement. The hard work and dedication of our talented RAs, under the supervision of Dr. Ashraful Islam, have truly paid off.

Congratulations to the team for this remarkable success! 🎉👏 We are incredibly proud of you all and excited to see your contributions making waves in the world of machine learning! 🌍✨

One paper has been accepted in ECAI 2024

Congratulations to our senior project student Fahim Ahmed and research assistant Md Fahim for getting their paper accepted into the core rank A conference, European Conference on AI (ECAI) https://www.ecai2024.eu/ . The acceptance rate was very competitive (24%) this time for ECAI 2024. The title of the paper is, “Improving the Performance of Transformer-based Models Over Classical Baselines in Multiple Transliterated Languages”.

Here is a short description of the paper:

Online discourse, by its very nature, is rife with transliterated text along with code-mixing and code-switching. Transliteration is heavily featured due to the ease of inputting romanized text with standard keyboards over native scripts. Due to its ubiquity, it is a critical area of study to ensure NLP models perform well in real-world scenarios.

In this paper, we analyze the performance of various language model’s performance on classification of romanized/transliterated social media text. We chose the tasks of sentiment analysis and offensive language identification. We carried out experiments for three different languages, namely Bangla, Hindi, and Arabic (for six datasets). To our surprise, we discovered across multiple datasets that the classical machine learning methods (Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and XGBoost) perform very competitively with fine-tuned transformer-based mono / multilingual language models (BanglishBERT, HingBERT, and DarijaBERT, XLM-RoBERTa, mBERT, and mDeBERTa), tiny LLMs (Gemma-2B, and TinyLLaMa) and ChatGPT for classification tasks in transliterated text. Additionally, we investigated various mitigation strategies such as translation and augmentation via the use of ChatGPT, as well as Masked Language Modelling to dataset-specific pretraining for language models. Depending on the dataset and language, employing those mitigation techniques yields a 2-3% further improvement in accuracy and macro-F1 above baseline.

We demonstrate TF-IDF and BoW-based classifiers achieve performance within around 3% of fine-tuned LMs and thus could thus be considered as a strong baseline for transliterated text-based NLP tasks.

5 papers from CCDS has been accepted in ICPR 2024

1.

Dehan, Farhan Noor; Fahim, Md; Rahman, AKM Mahabubur; Amin, M Ashraful; Ali, Amin Ahsan

TinyLLM Efficacy in Low-Resource Language 

In: 27th International Conference on Pattern Recognition, ICPR IEEE, KolKata, India, 2024.

2.

Sultana, Faria; Fuad, Md Tahmid Hasan; Fahim, Md; Rahman, Rahat Rizvi; Hossain, Meheraj; Amin, M Ashraful; Rahman, AKM Mahabubur; Ali, Amin Ahsan

How Good are LM and LLMs in Bangla Newspaper Article Summarization? 

In: 27th International Conference on Pattern Recognition, ICPR IEEE, KolKata, India, 2024.

3.

Kim, Minha; Bhaumik, Kishor; Ali, Amin Ahsan; Woo, Simon

MIXAD: Memory-Induced Explainable Time Series Anomaly Detection 

In: 27th International Conference on Pattern Recognition, ICPR IEEE, KolKata, India, 2024.

4.

Bhaumik, Kishor; Kimb, Minha; Niloy, Fahim Faisal; Ali, Amin Ahsan; Woo, Simon

SSMT: Few-Shot Traffic Forecasting with Single Source Meta-Transfer Learning 

In: IEEE Int’l Conf on Image Processing, ICPR IEEE, Abu Dhabi, 2024.

5.

Hossain, Mir Sazzat; Rahman, AKM Mahbubur; Amin, Md. Ashraful; Ali, Amin Ahsan

Lightweight Recurrent Neural Network for Image Super-resolution 

In: IEEE Int’l Conf on Image Processing, IEEE IEEE, Abu Dhabi, 2024.

Accepted Paper: Radio Galaxy Classification at INNS DLIA 2023

Our research paper, ‘Morphological Classification of Radio Galaxies using Semi-Supervised Group Equivariant CNNs,’ has been accepted for presentation at the esteemed INNS Deep Learning Innovations and Applications (INNS DLIA 2023) workshop, which is part of the International Joint Conference on Neural Networks (IJCNN 2023). The paper will also be published in the renowned Procedia Computer Science journal!

In this study, we tackled the challenge of limited labeled data in radio galaxy classification by employing a cutting-edge semi-supervised learning approach. By harnessing the power of Group Equivariant Convolutional Neural Networks (G-CNNs) as encoders, we achieved impressive results in classifying radio galaxies into the well-known Fanaroff-Riley Type I (FRI) and Type II (FRII) categories. [Link to Paper]

Explainable Hate Speech Detection: ICML 2023 Workshop Acceptance

Recently, Md Fahim, RA of CCDS with co-authors from UToronto, IUT and Fordham University has a paper accepted in AI and HCI workshop of ICML 2023. The paper proposes an interpretability and explainability oriented model to detect hate speech utilizing the pre-trained large language models. It creates dynamic class specific conceptual subspaces from which class specific attention is obtained by projecting the contextual embedding onto those spaces. These attentions provide better explainability of the detection task.
Paper Link: HateXplain2.0: An Explainable Hate Speech Detection Framework Utilizing Subjective Projection from Contextual Knowledge Space to Disjoint Concept Space

New paper from Dr. Ghosh’s group

Title: Holographic QFTs on AdS$_d$, wormholes and holographic interfaces.

arXiv preprint: https://arxiv.org/abs/2209.12094

We consider three related topics: (a) Holographic quantum field theories on AdS spaces. (b) Holographic interfaces of flat space QFTs. (c) Wormholes connecting generically different QFTs. We investigate in a concrete example how the related classical solutions explore the space of QFTs and we construct the general solutions that interpolate between the same or different CFTs with arbitrary couplings. The solution space contains many exotic RG flow solutions that realize unusual asymptotics, as boundaries of different regions in the space of solutions. We find phenomena like “walking” flows and the generation of extra boundaries via “flow fragmentation”.