Kazem Shekofteh, Dr.
Kazem Shekofteh is a postdoctoral research fellow at the Institute of Computer Engineering at Heidelberg University. His research interests focus on GPU computing, performance analysis of parallel programs and high performance computing in Bioinformatics. Previously, he was an assistant professor at Shandiz Institute of Higher Education, Mashhad, Iran. He got his PhD and MSc degree from Ferdowsi University of Mashhad, Iran in 2019. In late 2016, he was awarded a visiting scholarship at Heidelberg University. He has published papers in outstanding journals such as IEEE Transactions on Parallel and Distributed Systems. He has been serving as a lecturer of GPU Computing and seminar courses at Heidelberg University since 2022.
Research interests
- Performance Analysis on GPU
- Developing AAlgorithms on Intelligence Processing Units (IPU)
- Dealing with Sparsity (on IPUs)
- Analysis of Bioinformatics Algorithms on Accelerators (GPU and IPU)
Recent Service (4-year horizon)
Co-Chair
- 2024: Workshop on IoT, Edge, and Mobile for Embedded Machine Learning (ITEM), Vilnius, Lithuania, September 9, 2024
- 2023: Workshop on IoT, Edge, and Mobile for Embedded Machine Learning (ITEM), Torino, Italy, September 18, 2023
- 2022: Workshop on IoT, Edge, and Mobile for Embedded Machine Learning (ITEM), Grenoble, France, September 19, 2022
Program Committee Member
- 2024: 15th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems held in conjunction with SC24: The International Conference for High Performance Computing, Networking, Storage and Analysis
- 2024: 9th International Workshop on GPU Computing and AI held in conjunction with CANDAR 2024, Okinawa, Japan
- 2023: IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGRID), India
- 2023: International Conference on Parallel Processing (ICPP), Saalt Lake City, Utah, USA
- 2023: 8th International Workshop on GPU Computing and AI held in conjunction with CANDAR 2023, Matsue, Japan
Poster Committee Member
- 2023: The International Conference for High Performance Computing (SC)
Student Volunteer Co-Chair
- 2022: IEEE International Conference on Cluster Computing (CLUSTER)
Reviewer
- IEEE Transactions on Parallel and Distributed Systems (TPDS)
- Journal of Parallel and Distributed Computing (JPDC)
- Future Generation Computer Systems (FGCS)
- International Conference on Supercomputing (ICS)
- IEEE International Conference on Cluster Computing (CLUSTER)
Publications
- Reducing Memory Requirements for the IPU using Butterfly FactorizationsSC ’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, SC-W 2023, Denver, CO, USA, November 12-17, 2023, 1255–1263, ACM, 2023
@inproceedings{DBLP:conf/sc/ShekoftehAF23, author = {Shekofteh, S. Kazem and Alles, Christian and Fr{\"{o}}ning, Holger}, title = {Reducing Memory Requirements for the {IPU} using Butterfly Factorizations}, booktitle = {{SC} '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, {SC-W} 2023, Denver, CO, USA, November 12-17, 2023}, pages = {1255--1263}, publisher = {{ACM}}, year = {2023}, url = {https://doi.org/10.1145/3624062.3624196}, doi = {10.1145/3624062.3624196}, timestamp = {Tue, 28 Nov 2023 00:00:00 +0100}, }
- On Performance Analysis of Graphcore IPUs: Analyzing Squared and Skewed Matrix MultiplicationCoRR, abs/2310.00256, 2023
@article{DBLP:journals/corr/abs-2310-00256, author = {Shekofteh, S. Kazem and Alles, Christian and Kochend{\"{o}}rfer, Nils and Fr{\"{o}}ning, Holger}, title = {On Performance Analysis of Graphcore IPUs: Analyzing Squared and Skewed Matrix Multiplication}, journal = {CoRR}, volume = {abs/2310.00256}, year = {2023}, url = {https://arxiv.org/abs/2310.00256}, doi = {10.48550/ARXIV.2310.00256}, eprinttype = {arXiv}, eprint = {2310.00256}, timestamp = {Wed, 18 Oct 2023 01:00:00 +0200}, }
- cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUsIEEE Trans. Parallel Distributed Syst., 31(4), 766–778, 2020
@article{DBLP:journals/tpds/ShekoftehNNFY20, author = {Shekofteh, S. Kazem and Noori, Hamid and Naghibzadeh, Mahmoud and Fr{\"{o}}ning, Holger and Yazdi, Hadi Sadoghi}, title = {cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs}, journal = {{IEEE} Trans. Parallel Distributed Syst.}, volume = {31}, number = {4}, pages = {766--778}, year = {2020}, url = {https://doi.org/10.1109/TPDS.2019.2944602}, doi = {10.1109/TPDS.2019.2944602}, timestamp = {Fri, 02 Oct 2020 01:00:00 +0200}, }
- Metric Selection for GPU Kernel ClassificationACM Trans. Archit. Code Optim., 15(4), 68:1–68:27, 2019
@article{DBLP:journals/taco/ShekoftehNNYF19, author = {Shekofteh, S. Kazem and Noori, Hamid and Naghibzadeh, Mahmoud and Yazdi, Hadi Sadoghi and Fr{\"{o}}ning, Holger}, title = {Metric Selection for {GPU} Kernel Classification}, journal = {{ACM} Trans. Archit. Code Optim.}, volume = {15}, number = {4}, pages = {68:1--68:27}, year = {2019}, url = {https://doi.org/10.1145/3295690}, doi = {10.1145/3295690}, timestamp = {Sat, 08 Jan 2022 00:00:00 +0100}, }
- Efficient scheduling of streams on GPGPUsThe Journal of Supercomputing, 76(11), 9270–9302, 2020| bib
@article{roui2020efficient, author = {Beheshti Roui, Mohamad and Shekofteh, S. Kazem and Noori, Hamid and Harati, Ahad}, date = {2020/11/01}, date-added = {2024-04-04 16:55:55 +0200}, date-modified = {2024-04-04 16:56:58 +0200}, doi = {10.1007/s11227-020-03209-x}, id = {Beheshti Roui2020}, isbn = {1573-0484}, journal = {The Journal of Supercomputing}, number = {11}, pages = {9270--9302}, title = {Efficient scheduling of streams on GPGPUs}, url = {https://doi.org/10.1007/s11227-020-03209-x}, volume = {76}, year = {2020}, bdsk-url-1 = {https://doi.org/10.1007/s11227-020-03209-x} }
- Predicting Execution Time of CUDA Kernels with Unified Memory Capability2019 9th International Conference on Computer and Knowledge Engineering (ICCKE), 2019 9th International Conference on Computer and Knowledge Engineering (ICCKE), 437–443, 2019| bib
@proceedings{khorshahiyan2019predicting, author = {Khorshahiyan, F. and Shekofteh, S. . -K. and Noori, H.}, booktitle = {2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)}, date-added = {2024-04-04 16:57:38 +0200}, date-modified = {2024-04-04 16:58:19 +0200}, doi = {10.1109/ICCKE48569.2019.8964952}, isbn = {2643-279X}, journal = {2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)}, journal1 = {2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)}, pages = {437--443}, title = {Predicting Execution Time of CUDA Kernels with Unified Memory Capability}, year = {2019}, year1 = {24-25 Oct. 2019}, bdsk-url-1 = {https://doi.org/10.1109/ICCKE48569.2019.8964952} }
- A load scheduler for SIP proxy servers: design, implementation and evaluation of a history weighted window approachInternational Journal of Communication Systems, 30(3), e2980, John Wiley & Sons, Ltd, 2017| bib
@article{montazerolghaem2017load, author = {Montazerolghaem, Ahmadreza and Shekofteh, S. -Kazem and Yaghmaee, M. H. and Naghibzadeh, Mahmoud}, date = {2017/02/01}, date-added = {2024-04-04 17:00:11 +0200}, date-modified = {2024-04-04 17:00:43 +0200}, doi = {https://doi.org/10.1002/dac.2980}, isbn = {1074-5351}, journal = {International Journal of Communication Systems}, journal1 = {International Journal of Communication Systems}, journal2 = {International Journal of Communication Systems}, journal3 = {Int J Commun Syst}, keywords = {load balancer; scheduler; session initiation protocol; asterisk; overload}, month = {2024/04/04}, n2 = {Summary The widespread use of Session Initiation Protocol as a signalling protocol has created various challenges. An important one is that its throughput can be severely degraded when an overload happens in the proxy server because of several retransmissions from the user agent. One common approach to overcome this problem is ?load balancing?. A balancer needs to know the status of proxy servers, which are continuously gathered implicitly or explicitly. Implicit methods have averagely less overhead than explicit ones. This paper attempts to prevent throughput reduction by balancing the loads among available proxy servers properly using an implicit mechanism called History Weighted Average Response time. The proposed algorithm is robust because it incurs no extra processing to proxy servers. The novelty of the mechanism is making use of ?response time history? to estimate the load being currently processed on servers. By implementing in a real testbed, throughput and scalability are improved compared with an important state-of-the-art similar algorithm. This improvement stems from no need for modification in SIP protocol, easy implementation and application, simple computations for making decision and no need for extra feedback between servers and load balancer. Copyright ? 2015 John Wiley \& Sons, Ltd.}, number = {3}, pages = {e2980}, publisher = {John Wiley \& Sons, Ltd}, title = {A load scheduler for SIP proxy servers: design, implementation and evaluation of a history weighted window approach}, url = {https://doi.org/10.1002/dac.2980}, volume = {30}, year = {2017}, year1 = {2017}, bdsk-url-1 = {https://doi.org/10.1002/dac.2980} }
- A novel load scheduling for session initiation protocol networks2014 4th International Conference on Computer and Knowledge Engineering (ICCKE), 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE), 509–514, 2014| bib
@proceedings{montazerolghaem2014novel, author = {Montazerolghaem, A. and Shekofteh, S. . -K. and Khojaste, G. and Naghibzadeh, M. and Yaghmaee-M, M. -H.}, booktitle = {2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)}, date-added = {2024-04-04 17:01:42 +0200}, date-modified = {2024-04-04 17:01:53 +0200}, doi = {10.1109/ICCKE.2014.6993376}, journal = {2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)}, journal1 = {2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)}, pages = {509--514}, title = {A novel load scheduling for session initiation protocol networks}, year = {2014}, year1 = {29-30 Oct. 2014}, bdsk-url-1 = {https://doi.org/10.1109/ICCKE.2014.6993376} }
- CoreIIScheduler: Scheduling Tasks in a Multi-core-Based Grid Using NSGA-II TechniqueIntelligent Informatics, 507–518, Springer Berlin Heidelberg, 2013| bib
@proceedings{najmabad2013corell, address = {Berlin, Heidelberg}, author = {Najm Abad, Javad Mohebbi and Shekofteh, S. Kazem and Tabatabaee, Hamid and Mehrnejad, Maryam}, booktitle = {Intelligent Informatics}, date = {2013//}, date-added = {2024-04-04 17:02:50 +0200}, date-modified = {2024-04-04 17:03:11 +0200}, editor = {Abraham, Ajith and Thampi, Sabu M}, id = {10.1007/978-3-642-32063-7{\_}54}, isbn = {978-3-642-32063-7}, pages = {507--518}, publisher = {Springer Berlin Heidelberg}, title = {CoreIIScheduler: Scheduling Tasks in a Multi-core-Based Grid Using NSGA-II Technique}, year = {2013} }
- Reducing cache contention in a multi-core processor via a scheduler2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE), 6, V6-555-V6-558, 2010| bib
@inproceedings{shekofteh2010reducing, author = {Shekofteh, S.Kazem and Deldari, Hossein and Khalkhali, Maryam Baradaran}, booktitle = {2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)}, date-added = {2024-04-04 17:06:21 +0200}, date-modified = {2024-04-04 17:06:30 +0200}, doi = {10.1109/ICACTE.2010.5579213}, keywords = {component;multi-core architecture;resource contention;shared cache;thread scheduling}, pages = {V6-555-V6-558}, title = {Reducing cache contention in a multi-core processor via a scheduler}, volume = {6}, year = {2010}, bdsk-url-1 = {https://doi.org/10.1109/ICACTE.2010.5579213} }
- Improving cluster computing performance based on job futurity prediction2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE), 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE), 6, V6-303–V6-307, 2010| bib
@proceedings{salami2010improving, author = {Salami, H. and Saadatfar, H. and Fard, Farhad Rahmani and Shekofteh, S. Kazem and Deldari, H.}, booktitle = {2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)}, date-added = {2024-04-04 17:05:56 +0200}, date-modified = {2024-04-04 17:06:09 +0200}, doi = {10.1109/ICACTE.2010.5579820}, isbn = {2154-7505}, journal = {2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)}, journal1 = {2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)}, pages = {V6-303--V6-307}, title = {Improving cluster computing performance based on job futurity prediction}, vo = {6}, volume = {6}, year = {2010}, year1 = {20-22 Aug. 2010}, bdsk-url-1 = {https://doi.org/10.1109/ICACTE.2010.5579820} }
- Exploiting fuzzy approximator to head pose estimationSignal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2010, Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2010, 68–72, 2010| bib
@proceedings{khalkhali2010exploiting, author = {-Khalkhali, M. Baradaran and Shekofteh, S. Kazem and Toosizadeh, S. and -T, M. . -R. Akbarzadeh}, booktitle = {Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2010}, date-added = {2024-04-04 17:04:59 +0200}, date-modified = {2024-04-04 17:05:17 +0200}, isbn = {2326-0319}, journal = {Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2010}, journal1 = {Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2010}, pages = {68--72}, title = {Exploiting fuzzy approximator to head pose estimation}, year = {2010}, year1 = {23-25 Sept. 2010} }
- Head pose estimation using fuzzy approximator augmented by redundant membership functions2010 2nd International Conference on Software Technology and Engineering, 2010 2nd International Conference on Software Technology and Engineering, 2, V2-306–V2-310, 2010| bib
@proceedings{shekofteh2010head, author = {Shekofteh, S. K. and Baradaran-K, M. and Toosizadeh, S. and Akbarzadeh-T, M. -R. and Hashemi, M.}, booktitle = {2010 2nd International Conference on Software Technology and Engineering}, date-added = {2024-04-04 17:03:53 +0200}, date-modified = {2024-04-04 17:04:03 +0200}, doi = {10.1109/ICSTE.2010.5608799}, journal = {2010 2nd International Conference on Software Technology and Engineering}, journal1 = {2010 2nd International Conference on Software Technology and Engineering}, pages = {V2-306--V2-310}, title = {Head pose estimation using fuzzy approximator augmented by redundant membership functions}, vo = {2}, volume = {2}, year = {2010}, year1 = {3-5 Oct. 2010}, bdsk-url-1 = {https://doi.org/10.1109/ICSTE.2010.5608799} }
- A fuzzy approximator with Gaussian membership functions to estimate a human’s head pose2010 10th International Conference on Intelligent Systems Design and Applications, 2010 10th International Conference on Intelligent Systems Design and Applications, 1154–1158, 2010| bib
@proceedings{baradaran2010fuzzy, author = {Baradaran-K, M. and Shekofteh, S. K. and Toosizadeh, S. and Akbarzadeh-T, M. -R.}, booktitle = {2010 10th International Conference on Intelligent Systems Design and Applications}, date-added = {2024-04-04 17:03:24 +0200}, date-modified = {2024-04-04 17:03:34 +0200}, doi = {10.1109/ISDA.2010.5687029}, isbn = {2164-7151}, journal = {2010 10th International Conference on Intelligent Systems Design and Applications}, journal1 = {2010 10th International Conference on Intelligent Systems Design and Applications}, pages = {1154--1158}, title = {A fuzzy approximator with Gaussian membership functions to estimate a human's head pose}, year = {2010}, year1 = {29 Nov.-1 Dec. 2010}, bdsk-url-1 = {https://doi.org/10.1109/ISDA.2010.5687029} }