Skip to main content

Enhancing X-ray Security Image Synthesis: Advanced Generative Models and Innovative Data Augmentation Techniques

Research Abstract

This study addresses the field of X-ray security screening and focuses on synthesising realistic X-ray images using advanced generative models. Insufficient training data in this area pose a major challenge, which we address through innovative data augmentation techniques. We utilise the power of generative adversarial networks (GANs) and conditional GANs (cGANs), in particular the Pix2Pix and Pix2PixHD models, to investigate the generation of X-ray images from various inputs such as masks and edges. Our experiments conducted on a Korean dataset containing dangerous objects relevant to security screening show the effectiveness of these models in improving the quality and realism of image synthesis. Quantitative evaluations based on metrics such as PSNR, SSIM, LPIPS, FID, and FSIM, with scores of 19.93, 0.71, 0.12, 29.36, and 0.54, respectively, show the superiority of our strategy, especially when integrated with hybrid inputs containing both edges and masks. Overall, our results highlight the potential of advanced generative models to overcome the challenges of data scarcity in X-ray security screening and pave the way for more efficient and accurate inspection systems.

Research Authors
Bilel Yagoub, Mahmoud SalahEldin Kasem, and Hyun-Soo Kang
Research Date
Research Department
Research Image
Research Journal
Applied Sciences
Research Publisher
MDPI
Research Vol
14
Research Website
https://doi.org/10.3390/app14103961
Research Year
2024

ReceiptQA: A Question-Answering Dataset for Receipt Understanding

Research Abstract

Understanding information extracted from receipts is a critical task for real-world applications such as financial tracking, auditing, and enterprise resource management. In this paper, we introduce ReceiptQA, a novel large-scale dataset designed for receipt understanding through question-answering (QA). ReceiptQA contains 171,000 question–answer pairs derived from 3500 receipt images, constructed via two complementary methodologies: (1) LLM-Generated Dataset: 70,000 synthetically generated QA pairs, where each receipt is paired with 20 unique, context-specific questions. These questions are produced using a state-of-the-art large language model (LLM) and validated through human annotation to ensure accuracy, relevance, and diversity. (2) Human-Created Dataset: 101,000 manually crafted questions spanning answerable and unanswerable queries. This subset includes carefully designed templates of varying difficulty (easy/hard) to comprehensively evaluate QA systems across diverse receipt domains. To benchmark performance, we evaluate leading vision–language models (VLMs) and language models (LMs), including GPT-4o, Phi-3B, Phi-3.5B, LLaVA-7B, InternVL2 (4B/8B), LLaMA-3.2, and Gemini. We further fine-tune a LLaMA-3.2 11B model on ReceiptQA, achieving significant improvements over baseline models on validation and test sets. Our analysis uncovers critical strengths and limitations of existing models in handling receipt-based QA tasks, establishing a robust benchmark for future research.

Research Authors
Mahmoud Abdalla,Mahmoud SalahEldin Kasem,Mohamed Mahmoud,Bilel Yagoub ,Mostafa Farouk Senussi,Abdelrahman Abdallah,Seung Hun Kang, and Hyun Soo Kang
Research Date
Research Department
Research Journal
Mathematics
Research Pages
20
Research Publisher
MDPI
Research Vol
13
Research Website
https://doi.org/10.3390/math13111760
Research Year
2025

Two-Stage Video Violence Detection Framework Using GMFlow and CBAM-Enhanced ResNet3D

Research Abstract

Video violence detection has gained significant attention in recent years due to its applications in surveillance and security. This paper proposes a two-stage framework for detecting violent actions in video sequences. The first stage leverages GMFlow, a pre-trained optical flow network, to capture the temporal motion between consecutive frames, effectively encoding motion dynamics. In the second stage, we integrate these optical flow images with RGB frames and feed them into a CBAM-enhanced ResNet3D network to capture complementary spatiotemporal features. The attention mechanism provided by CBAM enables the network to focus on the most relevant regions in the frames, improving the detection of violent actions. We evaluate the proposed framework on three widely used datasets: Hockey Fight, Crowd Violence, and UBI-Fight. Our experimental results demonstrate superior performance compared to several state-of-the-art methods, achieving AUC scores of 0.963 on UBI-Fight and accuracies of 97.5% and 94.0% on Hockey Fight and Crowd Violence, respectively. The proposed approach effectively combines GMFlow-generated optical flow with deep 3D convolutional networks, providing robust and efficient detection of violence in videos.

Research Authors
Mohamed Mahmoud ,Bilel Yagoub,Mostafa Farouk Senussi,Mahmoud Abdalla, Mahmoud SalahEldin Kasem, Hyun-Soo Kang
Research Date
Research Department
Research Journal
Mathematics
Research Pages
20
Research Publisher
MDPI
Research Vol
13
Research Website
https://doi.org/10.3390/math13081226
Research Year
2025

AE-LSTM: Autoencoder with LSTM-Based Intrusion Detection in IoT

Research Abstract

The acceleration in the field of the Internet of things had increased security problems, so we find ourselves in need of effective ways to protect IoT systems From intrusions. Recently Machine learning plays an active role in network security and detecting attacks. in this research, we propose a Machine learning method (AE-LSTM) for intrusion detection which uses Autoencoder with LSTM. Our method has 6-layer Autoencoder (AE) model with LSTM that is effective in anomaly detection. To avoid the bias in our model which occur from imbalanced data in the NSL-KDD dataset, we use Standard Scaler in our AE-LSTM model To delete the outliers from the input. AE-LSTM uses the best reconstruction function. It is critical in discovering whether network traffic is normal or abnormal. We use the NSL-KDD test dataset to evaluate our proposed model. Our Model achieved the highest accuracy over other methods with f1-score micro and weight at 98.69% and 98.70% for 5 classes in detection methods (Dos, Probe, R2L, U2R, Normal). Also, we evaluated it with two classes Malicious, Normal) with f1-score micro and weight at 98.78% and 98.78%.

Research Authors
Mohamed Mahmoud; Mahmoud Kasem; Abdelrahman Abdallah; Hyun Soo Kang
Research Date
Research Department
Research Journal
2022 International Telecommunications Conference (ITC-Egypt), DOI: 10.1109/ITC-Egypt55520.2022.9855688
Research Pages
6
Research Publisher
IEEE
Research Website
DOI: 10.1109/ITC-Egypt55520.2022.9855688
Research Year
2022

Neural Network Estimation Model to Optimize Timing and Schedule of Software Projects

Research Abstract

Software projects have a probability of high failure rates that appear to linger around 60% for significant IT projects. Estimating time and project schedule are crucial tasks and extremely influence the project outcomes. Artificial Intelligence now can provide multiple solutions for most problems of software projects. This article aims to develop a Neural Network estimation model to manipulate the problem of timing for software projects. The model can predict the estimation value of project time which optimizes the scheduling process, the developed model achieved high accuracy after testing through the test datasets.

Research Authors
Mohamed A. Hamada, Abdelrahman Abdallah, Mahmoud Kasem, Mohamed Abokhalil
Research Date
Research Department
Research Journal
2021 IEEE Smart Information Systems and Technologies (SIST), DOI: 10.1109/SIST50301.2021.9465887
Research Pages
7
Research Publisher
IEEE
Research Website
10.1109/SIST50301.2021.9465887
Research Year
2021

GANMasker: A Two-Stage Generative Adversarial Network for High-Quality Face Mask Removal

Research Abstract

Deep-learning-based image inpainting methods have made remarkable advancements, particularly in object removal tasks. The removal of face masks has gained significant attention, especially in the wake of the COVID-19 pandemic, and while numerous methods have successfully addressed the removal of small objects, removing large and complex masks from faces remains demanding. This paper presents a novel two-stage network for unmasking faces considering the intricate facial features typically concealed by masks, such as noses, mouths, and chins. Additionally, the scarcity of paired datasets comprising masked and unmasked face images poses an additional challenge. In the first stage of our proposed model, we employ an autoencoder-based network for binary segmentation of the face mask. Subsequently, in the second stage, we introduce a generative adversarial network (GAN)-based network enhanced with attention and Masked–Unmasked Region Fusion (MURF) mechanisms to focus on the masked region. Our network generates realistic and accurate unmasked faces that resemble the original faces. We train our model on paired unmasked and masked face images sourced from CelebA, a large public dataset, and evaluate its performance on multi-scale masked faces. The experimental results illustrate that the proposed method surpasses the current state-of-the-art techniques in both qualitative and quantitative metrics. It achieves a Peak Signal-to-Noise Ratio (PSNR) improvement of 4.18 dB over the second-best method, with the PSNR reaching 30.96. Additionally, it exhibits a 1% increase in the Structural Similarity Index Measure (SSIM), achieving a value of 0.95.

Research Authors
Mohamed Mahmoud, andHyun-Soo Kang
Research Date
Research Department
Research Image
Two-stage approach for face unmasking
Research Journal
Sensors
Research Pages
22
Research Publisher
MDPI
Research Vol
23
Research Website
https://doi.org/10.3390/s23167094
Research Year
2023

Conv3D-Based Video Violence Detection Network Using Optical Flow and RGB Data

Research Abstract

Detecting violent behavior in videos to ensure public safety and security poses a significant challenge. Precisely identifying and categorizing instances of violence in real-life closed-circuit television, which vary across specifications and locations, requires comprehensive understanding and processing of the sequential information embedded in these videos. This study aims to introduce a model that adeptly grasps the spatiotemporal context of videos within diverse settings and specifications of violent scenarios. We propose a method to accurately capture spatiotemporal features linked to violent behaviors using optical flow and RGB data. The approach leverages a Conv3D-based ResNet-3D model as the foundational network, capable of handling high-dimensional video data. The efficiency and accuracy of violence detection are enhanced by integrating an attention mechanism, which assigns greater weight to the most crucial frames within the RGB and optical-flow sequences during instances of violence. Our model was evaluated on the UBI-Fight, Hockey, Crowd, and Movie-Fights datasets; the proposed method outperformed existing state-of-the-art techniques, achieving area under the curve scores of 95.4, 98.1, 94.5, and 100.0 on the respective datasets. Moreover, this research not only has the potential to be applied in real-time surveillance systems but also promises to contribute to a broader spectrum of research in video analysis and understanding.

Research Authors
Jae-Hyuk Park,Mohamed Mahmoud, andHyun-Soo Kang
Research Date
Research Department
Research Image
Video Violence Detection Network Using Optical Flow and RGB Data
Research Journal
Sensors
Research Pages
15
Research Publisher
MDPI
Research Vol
24
Research Website
https://doi.org/10.3390/s24020317
Research Year
2024

A Comprehensive Survey of Masked Faces: Recognition, Detection, and Unmasking

Research Abstract

Masked face recognition (MFR) has emerged as a critical domain in biometric identification, especially with the global COVID-19 pandemic, which introduced widespread face masks. This survey paper presents a comprehensive analysis of the challenges and advancements in recognizing and detecting individuals with masked faces, which has seen innovative shifts due to the necessity of adapting to new societal norms. Advanced through deep learning techniques, MFR, along with face mask recognition (FMR) and face unmasking (FU), represents significant areas of focus. These methods address unique challenges posed by obscured facial features, from fully to partially covered faces. Our comprehensive review explores the various deep learning-based methodologies developed for MFR, FMR, and FU, highlighting their distinctive challenges and the solutions proposed to overcome them. Additionally, we explore benchmark datasets and evaluation metrics specifically tailored for assessing performance in MFR research. The survey also discusses the substantial obstacles still facing researchers in this field and proposes future directions for the ongoing development of more robust and effective masked face recognition systems. This paper serves as an invaluable resource for researchers and practitioners, offering insights into the evolving landscape of face recognition technologies in the face of global health crises and beyond.

Research Authors
Mohamed Mahmoud, Mahmoud SalahEldin Kasem, Hyun-Soo Kang
Research Date
Research Department
Research Image
Masked Faces Recognition, Detection, and Unmasking
Research Journal
Applied Sciences
Research Pages
37
Research Publisher
MDPI
Research Vol
14
Research Website
https://doi.org/10.3390/app14198781
Research Year
2024

Statistical-based detection of pilot contamination attack for NOMA in 5G networks

Research Authors
Dalia Nashat, Sahar Khairy
Research Date
Research Department
Research Journal
Scientific Reports
Research Pages
3726
Research Publisher
Nature Publishing Group UK
Research Year
2025
Subscribe to