Skip to main content

Developing an Efficient Spectral Clustering Algorithm on Large Scale Graphs in Spark

Research Abstract
Recently, most of the data can be represented by graph structures, such as social media, Protein-Protein Interaction, transportation system, systems biology,..., etc. Many researches have been achieved to cluster very large graphs but more efficient algorithms are required since such a process takes a long time and requires more memory. In this paper, we propose an Efficient Spectral Clustering Algorithm on Large Scale Graphs in Spark (ESCALG), using map reduce function and shuffling phases in Dijkstra's algorithm. In addition, ESCALG depends mainly on a sparse matrix as a data structure, which less time in execution. Then, GraphX is applied to deal with graph data processing and in GraphX used Pregel in computing shortest path. To test the performance of ESCALG, it is compared with Large-Scale Spectral Clustering on Graphs and Standard Spectral Clustering Algorithms using seven datasets, where ESCALG proved high efciency in terms of memory and time performance.
Research Authors
Ahmed I. Taloba
Marwan R. Riad
Taysir Hassan A. Soliman
Research Department
Research Journal
2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS)
Research Pages
292-298
Research Publisher
IEEE
Research Rank
4
Research Vol
NULL
Research Website
cairo , egypt
Research Year
2017

Developing an Efficient Spectral Clustering Algorithm on Large Scale Graphs in Spark

Research Abstract
Recently, most of the data can be represented by graph structures, such as social media, Protein-Protein Interaction, transportation system, systems biology,..., etc. Many researches have been achieved to cluster very large graphs but more efficient algorithms are required since such a process takes a long time and requires more memory. In this paper, we propose an Efficient Spectral Clustering Algorithm on Large Scale Graphs in Spark (ESCALG), using map reduce function and shuffling phases in Dijkstra's algorithm. In addition, ESCALG depends mainly on a sparse matrix as a data structure, which less time in execution. Then, GraphX is applied to deal with graph data processing and in GraphX used Pregel in computing shortest path. To test the performance of ESCALG, it is compared with Large-Scale Spectral Clustering on Graphs and Standard Spectral Clustering Algorithms using seven datasets, where ESCALG proved high efciency in terms of memory and time performance.
Research Authors
Ahmed I. Taloba
Marwan R. Riad
Taysir Hassan A. Soliman
Research Department
Research Journal
2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS)
Research Member
Research Pages
292-298
Research Publisher
IEEE
Research Rank
4
Research Vol
NULL
Research Website
cairo , egypt
Research Year
2017

Developing an Efficient Spectral Clustering Algorithm on Large Scale Graphs in Spark

Research Abstract
Recently, most of the data can be represented by graph structures, such as social media, Protein-Protein Interaction, transportation system, systems biology,..., etc. Many researches have been achieved to cluster very large graphs but more efficient algorithms are required since such a process takes a long time and requires more memory. In this paper, we propose an Efficient Spectral Clustering Algorithm on Large Scale Graphs in Spark (ESCALG), using map reduce function and shuffling phases in Dijkstra's algorithm. In addition, ESCALG depends mainly on a sparse matrix as a data structure, which less time in execution. Then, GraphX is applied to deal with graph data processing and in GraphX used Pregel in computing shortest path. To test the performance of ESCALG, it is compared with Large-Scale Spectral Clustering on Graphs and Standard Spectral Clustering Algorithms using seven datasets, where ESCALG proved high efciency in terms of memory and time performance.
Research Authors
Ahmed I. Taloba
Marwan R. Riad
Taysir Hassan A. Soliman
Research Department
Research Journal
2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS)
Research Pages
292-298
Research Publisher
IEEE
Research Rank
4
Research Vol
NULL
Research Website
cairo , egypt
Research Year
2017

Towards Transforming Natural Language Queries into SPARQL Queries

Research Abstract
NULL
Research Authors
Majid Askar
Research Department
Research Journal
Baltic DB&IS 2020: 14th International Baltic Conference on Databases and Information
Systems
Research Pages
NULL
Research Publisher
NULL
Research Rank
3
Research Vol
NULL
Research Website
https://dbis.ttu.ee/
Research Year
2020

Query Processing in Ontology Based Data Access

Research Abstract
NULL
Research Authors
Majid Askar, Alsayed Algergawy, Taysir Soliman, Birgitta König-Ries, Adel Sewisy
Research Department
Research Journal
10th International Conference on Ecological Informatics-Translating Ecological Data into Knowledge and Decisions in a Rapidly Changing World
Research Pages
NULL
Research Publisher
NULL
Research Rank
3
Research Vol
NULL
Research Website
https://icei2018.uni-jena.de/
Research Year
2018

Query Processing in Ontology Based Data Access

Research Abstract
NULL
Research Authors
Majid Askar, Alsayed Algergawy, Taysir Soliman, Birgitta König-Ries, Adel Sewisy
Research Department
Research Journal
10th International Conference on Ecological Informatics-Translating Ecological Data into Knowledge and Decisions in a Rapidly Changing World
Research Pages
NULL
Research Publisher
NULL
Research Rank
3
Research Vol
NULL
Research Website
https://icei2018.uni-jena.de/
Research Year
2018

A Semantic Big Biodiversity Data Integration Tool

Research Abstract
NULL
Research Authors
Taysir Soliman, Alsayed Algergawy, Birgitta König-Ries, Majid Askar, Marwa Abdelreheim
Research Department
Research Journal
ICEI 2018: 10th International Conference on Ecological Informatics-Translating Ecological Data into Knowledge and Decisions in a Rapidly Changing World
Research Pages
NULL
Research Publisher
NULL
Research Rank
3
Research Vol
NULL
Research Website
NULL
Research Year
2018

<i>Supervised detection of newly appearing T2-w multiple sclerosis lesions with subtraction and deformation fields features
</i>

Research Abstract
Background: MRI has become one of the most important clinical tools for longitudinal analysis of multiple sclerosis (MS). Newly appearing lesions are indicative of the disease progression. Several automatic approaches have been proposed for the detection of newly appearing lesions, which can be classified as either supervised approaches that use intensity-derived features from the subtraction images or unsupervised approaches that use also deformation fields information. Aim: We present here a supervised approach for detecting newly appearing MS lesions that combines both subtraction and deformation field features. Specifically, we use a logistic regression classifier trained with features from the baseline and follow-up intensities, subtraction values, and deformation field operators to provide a final segmentation. Materials and methods: One year apart multi-channel brain MRI were scanned for 60 patients with a 3T magnet, including transverse T2-FLAIR, PD-w, T2-w and T1-w images. 36 of these patients presented new T2-w lesions that were semi-automatically annotated by expert neuroradiologists. The rest had no new lesions in the follow-up scans. All images were pre-processed and co-registered by multi resolution-multi stage affine registration, and a deformation field was also obtained using the Demons non-rigid registration algorithm. Results: We performed a leave-one-out cross-validation strategy using the 36 patients with new T2-w lesions. In terms of detection, we obtained a 74.30% true positive fraction and 11.86% false positive fraction with a mean Dice similarity coefficient of 0.77. In terms of segmentation, we obtained a mean Dice coefficient of 0.56. We compared these results with those obtained with state-of-the-art methods such as Sweeney et al. (2013), Ganiler et al. (2014), and Cabezas et al. (2016), and our model had significantly better results (p 0.05). When testing the model with the 24 patients with no new T2-w lesions, only 5 false positives were found in 4 cases. Conclusions: The proposed model decreases the number of false positives while increasing the number of true positives. The study also proves the benefits of using deformation field operators as features to train a supervised learning model. Our approach is simple and fully automated and reduces user interaction and inter- and intra-observer variability. Disclosure: M. Salem: nothing to disclose. M. Cabezas: nothing to disclose. S. Valverde: nothing to disclose. D. Pareto: has received speaking honoraria fron Novartis and Biogen. A. Oliver: nothing to disclose. J. Salvi: nothing to disclose. A. Rovira serves on scientific advisory boards for Biogen Idec, Novartis, Sanofi-Genzyme, and OLEA Medical, has received speaker honoraria from Bayer, Sanofi-Genzyme, Bracco, Merck-Serono, Teva Pharmaceutical Industries Ltd, Novartis and Biogen Idec, and has research agreements with Siemens. X. Lladó: nothing to disclose.
Research Authors
<b>Mostafa Salem</b>, Mariano Cabezas, Sergi Valverde, Deborah Pareto, Joaquim Salvi, Arnau Oliver, Àlex Rovira, Xavier Lladó
Research Department
Research Journal
Multiple Sclerosis Journal - ECTRIMS (JCR CN IF:5.649 Q1(23/199)), Paris. France
Research Pages
pp. 794-794
Research Publisher
SAGE PUBLICATIONS LTD
Research Rank
3
Research Vol
Vol. 23
Research Website
NULL
Research Year
2017
Subscribe to