Skip to main content

Spectral Normalized U-Net for Light Field Occlusion Removal

Research Abstract

Occlusion artifacts significantly hinder light field (LF) image reconstruction, especially in complex scenes. We propose a spectral normalized U-Net for LF occlusion removal, which begins by stacking LF views and extracting view-dependent features using a local feature encoder. To capture spatial complexity, ResASPP enable multi-scale context aggregation, while channel attention enhances occlusion-related features. Spectral normalization is applied to all convolutional layers to improve training stability and generalization. The encoder-decoder structure with skip connections preserves fine details. Experimental results show our method restores occluded regions more accurately than baselines.

Research Authors
Mostafa Farouk Senussi, Mahmoud Abdalla, Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Hyun-Soo Kang
Research Date
Research Department
Research Image
Overview of the Proposed Spectral Normalized U-Net for Occlusion Removal in LF Images
Research Journal
INTERNATIONAL CONFERENCE ON FUTURE INFORMATION & COMMUNICATION ENGINEERING
Research Pages
294-297
Research Publisher
Korea Information and Communications Society
Research Vol
16
Research Website
https://www.dbpia.co.kr/pdf/pdfView.do?nodeId=NODE12293106
Research Year
2025

Occlusion removal in light-field images using CSPDarknet53 and bidirectional feature pyramid network: a multi-scale fusion-based approach

Research Abstract

Occlusion removal in light-field images remains a significant challenge, particularly when dealing with large occlusions. An architecture based on end-to-end learning is proposed to address this challenge that interactively combines CSPDarknet53 and the bidirectional feature pyramid network for efficient light-field occlusion removal. CSPDarknet53 acts as the backbone, providing robust and rich feature extraction across multiple scales, while the bidirectional feature pyramid network enhances comprehensive feature integration through an advanced multi-scale fusion mechanism. To preserve efficiency without sacrificing the quality of the extracted feature, our model uses separable convolutional blocks. A simple refinement module based on half-instance initialization blocks is integrated to explore the local details and global structures. The network’s multi-perspective approach guarantees almost total occlusion removal, enabling it to handle occlusions of varying sizes or complexity. Numerous experiments were run on sparse and dense datasets with varying degrees of occlusion severity in order to assess the performance. Significant advancements over the current cutting-edge techniques are shown in the findings for the sparse dataset, while competitive results are obtained for the dense dataset.

Research Authors
Mostafa Farouk Senussi, Hyun-Soo Kang
Research Date
Research Department
Research File
Research Image
Occlusion Removal in Light-Field Images Using CSPDarknet53 and Bidirectional Feature Pyramid Network: A Multi-Scale Fusion-Based Approach
Research Journal
Applied Sciences
Research Member
Research Pages
(20), 9332
Research Publisher
MDPI
Research Vol
14
Research Website
https://doi.org/10.3390/app14209332
Research Year
2024

Improving BI-RADS Mammographic Classification With Self-Supervised Vision Transformers and Cascade Learning

Research Abstract

Accurate and early breast cancer detection is critical for improving patient outcomes. In this study, we propose PatchCascade-ViT, a novel self-supervised Vision Transformer (ViT) framework for automated BI-RADS classification of mammographic images. Unlike conventional deep learning approaches that rely heavily on annotated datasets, PatchCascade-ViT leverages Self Patch-level Supervision (SPS) to learn meaningful mammographic representations from unlabeled data, significantly enhancing classification performance. Our framework operates through a two-stage cascade classification process. In the first stage, the model differentiates non-cancerous from potentially cancerous mammograms using SelfPatch, an innovative self-supervised learning task that enhances patch-level feature learning by enforcing consistency among spatially correlated patches. The second stage refines the classification by distinguishing Scattered Fibroglandular from Heterogeneously and Extremely Dense breast tissue categories, enabling more precise breast cancer risk assessment. To validate the effectiveness of PatchCascade-ViT, we conducted extensive evaluations on a dataset of 4,368 mammograms across three BI-RADS classes. Our method achieved a system sensitivity of 85.01% and an F1-score of 84.90%, outperforming existing deep learning-based approaches. By integrating self-supervised learning with a cascade vision transformer architecture, PatchCascade-ViT reduces reliance on annotated datasets while maintaining high classification accuracy. These findings demonstrate its potential for enhancing breast cancer screening, aiding radiologists in early detection, and improving clinical decision-making.

Research Authors
Abdelrahman Abdallah; Mahmoud Salaheldin Kasem; Ibrahim Abdelhalim; Norah Saleh Alghamdi; Ayman El-Baz
Research Date
Research Department
Research Image
Research Journal
IEEE Access
Research Pages
135500 - 135514
Research Publisher
IEEE
Research Vol
13
Research Website
10.1109/ACCESS.2025.3581582
Research Year
2025

Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey

Research Abstract

Optical character recognition (OCR) is a vital process that involves the extraction of handwritten or printed text from scanned or printed images, converting it into a format that can be understood and processed by machines. The automatic extraction of text through OCR plays a crucial role in digitizing documents, enhancing productivity, and preserving historical records. This paper offers an exhaustive review of contemporary applications, methodologies, and challenges associated with Arabic OCR. A thorough analysis is conducted on prevailing techniques utilized throughout the OCR process, with a dedicated effort to discern the most efficacious approaches that demonstrate enhanced outcomes. To ensure a thorough evaluation, a meticulous keyword-search methodology is adopted, encompassing a comprehensive analysis of articles relevant to Arabic OCR. In addition to presenting cutting-edge techniques and methods, this paper identifies research gaps within the realm of Arabic OCR. We shed light on potential areas for future exploration and development, thereby guiding researchers toward promising avenues in the field of Arabic OCR. The outcomes of this study provide valuable insights for researchers, practitioners, and stakeholders involved in Arabic OCR, ultimately fostering advancements in the field and facilitating the creation of more accurate and efficient OCR systems for the Arabic language.

Research Authors
Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Hyun-Soo Kang
Research Date
Research Department
Research Image
Research Journal
ACM Computing Surveys
Research Publisher
ACM Computing Surveys
Research Website
https://doi.org/10.1145/3768150
Research Year
2025

HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images

Research Abstract

Table detection in document images is a challenging problem due to diverse layouts, irregular structures, and embedded graphical elements. In this study, we present HTTD (Hierarchical Transformer for Table Detection), a cutting-edge model that combines a Swin-L Transformer backbone with advanced Transformer-based mechanisms to achieve superior performance. HTTD addresses three key challenges: handling diverse document layouts, including historical and modern structures; improving computational efficiency and training convergence; and demonstrating adaptability to non-standard tasks like medical imaging and receipt key detection. Evaluated on benchmark datasets, HTTD achieves state-of-the-art results, with precision rates of 96.98% on ICDAR-2019 cTDaR, 96.43% on TNCR, and 93.14% on TabRecSet. These results validate its effectiveness and efficiency, paving the way for advanced document analysis and data digitization tasks.

Research Authors
Mahmoud SalahEldin Kasem, Mohamed Mahmoud , Bilel Yagoub, Mostafa Farouk Senussi, Mahmoud Abdalla , andHyun-Soo Kang
Research Date
Research Department
Research File
Research Image
Research Journal
Mathematics
Research Publisher
MDPI
Research Vol
13
Research Website
https://doi.org/10.3390/math13020266
Research Year
2025

Enhancing X-ray Security Image Synthesis: Advanced Generative Models and Innovative Data Augmentation Techniques

Research Abstract

This study addresses the field of X-ray security screening and focuses on synthesising realistic X-ray images using advanced generative models. Insufficient training data in this area pose a major challenge, which we address through innovative data augmentation techniques. We utilise the power of generative adversarial networks (GANs) and conditional GANs (cGANs), in particular the Pix2Pix and Pix2PixHD models, to investigate the generation of X-ray images from various inputs such as masks and edges. Our experiments conducted on a Korean dataset containing dangerous objects relevant to security screening show the effectiveness of these models in improving the quality and realism of image synthesis. Quantitative evaluations based on metrics such as PSNR, SSIM, LPIPS, FID, and FSIM, with scores of 19.93, 0.71, 0.12, 29.36, and 0.54, respectively, show the superiority of our strategy, especially when integrated with hybrid inputs containing both edges and masks. Overall, our results highlight the potential of advanced generative models to overcome the challenges of data scarcity in X-ray security screening and pave the way for more efficient and accurate inspection systems.

Research Authors
Bilel Yagoub, Mahmoud SalahEldin Kasem, and Hyun-Soo Kang
Research Date
Research Department
Research Image
Research Journal
Applied Sciences
Research Publisher
MDPI
Research Vol
14
Research Website
https://doi.org/10.3390/app14103961
Research Year
2024

ReceiptQA: A Question-Answering Dataset for Receipt Understanding

Research Abstract

Understanding information extracted from receipts is a critical task for real-world applications such as financial tracking, auditing, and enterprise resource management. In this paper, we introduce ReceiptQA, a novel large-scale dataset designed for receipt understanding through question-answering (QA). ReceiptQA contains 171,000 question–answer pairs derived from 3500 receipt images, constructed via two complementary methodologies: (1) LLM-Generated Dataset: 70,000 synthetically generated QA pairs, where each receipt is paired with 20 unique, context-specific questions. These questions are produced using a state-of-the-art large language model (LLM) and validated through human annotation to ensure accuracy, relevance, and diversity. (2) Human-Created Dataset: 101,000 manually crafted questions spanning answerable and unanswerable queries. This subset includes carefully designed templates of varying difficulty (easy/hard) to comprehensively evaluate QA systems across diverse receipt domains. To benchmark performance, we evaluate leading vision–language models (VLMs) and language models (LMs), including GPT-4o, Phi-3B, Phi-3.5B, LLaVA-7B, InternVL2 (4B/8B), LLaMA-3.2, and Gemini. We further fine-tune a LLaMA-3.2 11B model on ReceiptQA, achieving significant improvements over baseline models on validation and test sets. Our analysis uncovers critical strengths and limitations of existing models in handling receipt-based QA tasks, establishing a robust benchmark for future research.

Research Authors
Mahmoud Abdalla,Mahmoud SalahEldin Kasem,Mohamed Mahmoud,Bilel Yagoub ,Mostafa Farouk Senussi,Abdelrahman Abdallah,Seung Hun Kang, and Hyun Soo Kang
Research Date
Research Department
Research Journal
Mathematics
Research Pages
20
Research Publisher
MDPI
Research Vol
13
Research Website
https://doi.org/10.3390/math13111760
Research Year
2025

Two-Stage Video Violence Detection Framework Using GMFlow and CBAM-Enhanced ResNet3D

Research Abstract

Video violence detection has gained significant attention in recent years due to its applications in surveillance and security. This paper proposes a two-stage framework for detecting violent actions in video sequences. The first stage leverages GMFlow, a pre-trained optical flow network, to capture the temporal motion between consecutive frames, effectively encoding motion dynamics. In the second stage, we integrate these optical flow images with RGB frames and feed them into a CBAM-enhanced ResNet3D network to capture complementary spatiotemporal features. The attention mechanism provided by CBAM enables the network to focus on the most relevant regions in the frames, improving the detection of violent actions. We evaluate the proposed framework on three widely used datasets: Hockey Fight, Crowd Violence, and UBI-Fight. Our experimental results demonstrate superior performance compared to several state-of-the-art methods, achieving AUC scores of 0.963 on UBI-Fight and accuracies of 97.5% and 94.0% on Hockey Fight and Crowd Violence, respectively. The proposed approach effectively combines GMFlow-generated optical flow with deep 3D convolutional networks, providing robust and efficient detection of violence in videos.

Research Authors
Mohamed Mahmoud ,Bilel Yagoub,Mostafa Farouk Senussi,Mahmoud Abdalla, Mahmoud SalahEldin Kasem, Hyun-Soo Kang
Research Date
Research Department
Research Journal
Mathematics
Research Pages
20
Research Publisher
MDPI
Research Vol
13
Research Website
https://doi.org/10.3390/math13081226
Research Year
2025

AE-LSTM: Autoencoder with LSTM-Based Intrusion Detection in IoT

Research Abstract

The acceleration in the field of the Internet of things had increased security problems, so we find ourselves in need of effective ways to protect IoT systems From intrusions. Recently Machine learning plays an active role in network security and detecting attacks. in this research, we propose a Machine learning method (AE-LSTM) for intrusion detection which uses Autoencoder with LSTM. Our method has 6-layer Autoencoder (AE) model with LSTM that is effective in anomaly detection. To avoid the bias in our model which occur from imbalanced data in the NSL-KDD dataset, we use Standard Scaler in our AE-LSTM model To delete the outliers from the input. AE-LSTM uses the best reconstruction function. It is critical in discovering whether network traffic is normal or abnormal. We use the NSL-KDD test dataset to evaluate our proposed model. Our Model achieved the highest accuracy over other methods with f1-score micro and weight at 98.69% and 98.70% for 5 classes in detection methods (Dos, Probe, R2L, U2R, Normal). Also, we evaluated it with two classes Malicious, Normal) with f1-score micro and weight at 98.78% and 98.78%.

Research Authors
Mohamed Mahmoud; Mahmoud Kasem; Abdelrahman Abdallah; Hyun Soo Kang
Research Date
Research Department
Research Journal
2022 International Telecommunications Conference (ITC-Egypt), DOI: 10.1109/ITC-Egypt55520.2022.9855688
Research Pages
6
Research Publisher
IEEE
Research Website
DOI: 10.1109/ITC-Egypt55520.2022.9855688
Research Year
2022
Subscribe to