A Survey on Parallel Computing and Its Applications in Data-Parallel Problems Using GPU Architectures
Year: 2014
Communications in Computational Physics, Vol. 15 (2014), Iss. 2 : pp. 285–329
Abstract
Parallel computing has become an important subject in the field of computer science and has proven to be critical when researching high performance solutions. The evolution of computer architectures (multi-core and many-core) towards a higher number of cores can only confirm that parallelism is the method of choice for speeding up an algorithm. In the last decade, the graphics processing unit, or GPU, has gained an important place in the field of high performance computing (HPC) because of its low cost and massive parallel processing power. Super-computing has become, for the first time, available to anyone at the price of a desktop computer. In this paper, we survey the concept of parallel computing and especially GPU computing. Achieving efficient parallel algorithms for the GPU is not a trivial task, there are several technical restrictions that must be satisfied in order to achieve the expected performance. Some of these limitations are consequences of the underlying architecture of the GPU and the theoretical models behind it. Our goal is to present a set of theoretical and technical concepts that are often required to understand the GPU and its massive parallelism model. In particular, we show how this new technology can help the field of computational physics, especially when the problem is data-parallel. We present four examples of computational physics problems: n-body, collision detection, Potts model and cellular automata simulations. These examples well represent the kind of problems that are suitable for GPU computing. By understanding the GPU architecture and its massive parallelism programming model, one can overcome many of the technical limitations found along the way, design better GPU-based algorithms for computational physics problems and achieve speedups that can reach up to two orders of magnitude when compared to sequential implementations.
Journal Article Details
Publisher Name: Global Science Press
Language: English
DOI: https://doi.org/10.4208/cicp.110113.010813a
Communications in Computational Physics, Vol. 15 (2014), Iss. 2 : pp. 285–329
Published online: 2014-01
AMS Subject Headings: Global Science Press
Copyright: COPYRIGHT: © Global Science Press
Pages: 45
-
A scalable and energy efficient GPU thread map for m-simplex domains
Navarro, Cristóbal A. | Quezada, Felipe A. | Bustos, Benjamin | Hitschfeld, Nancy | Kindelan, RolandoFuture Generation Computer Systems, Vol. 141 (2023), Iss. P.651
https://doi.org/10.1016/j.future.2022.12.020 [Citations: 0] -
Potential benefits of a block-space GPU approach for discrete tetrahedral domains
Navarro, Cristobal A. | Bustos, Benjamin | Hitschfeld, Nancy2016 XLII Latin American Computing Conference (CLEI), (2016), P.1
https://doi.org/10.1109/CLEI.2016.7833394 [Citations: 4] -
Analysis of Global and Local Synchronization in Parallel Computing
Cicirelli, Franco | Giordano, Andrea | Mastroianni, CarloIEEE Transactions on Parallel and Distributed Systems, Vol. 32 (2021), Iss. 5 P.988
https://doi.org/10.1109/TPDS.2020.3037469 [Citations: 7] -
Francis 99 CFD through RapidCFD accelerated GPU code
Molinero, D | Galván, S | Domínguez, F. | Ibarra, L | Solorio, GIOP Conference Series: Earth and Environmental Science, Vol. 774 (2021), Iss. 1 P.012016
https://doi.org/10.1088/1755-1315/774/1/012016 [Citations: 2] -
GPU Tensor Cores for Fast Arithmetic Reductions
Navarro, Cristobal A. | Carrasco, Roberto | Barrientos, Ricardo J. | Riquelme, Javier A. | Vega, RaimundoIEEE Transactions on Parallel and Distributed Systems, Vol. 32 (2021), Iss. 1 P.72
https://doi.org/10.1109/TPDS.2020.3011893 [Citations: 30] -
Synthesis and feedback on the distribution and parallelization of FMI-CS-based co-simulations with the DACCOSIM platform
Dad, Cherifa | Tavella, Jean-Philippe | Vialle, StéphaneParallel Computing, Vol. 106 (2021), Iss. P.102802
https://doi.org/10.1016/j.parco.2021.102802 [Citations: 1] -
Modified Fully Homomorphic Encryption based on Parallel Processing in Cloud Computing
Tandel, Parth | Shubhrant, Abhinav | Sohani, MayankInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology, Vol. (2021), Iss. P.250
https://doi.org/10.32628/CSEIT217252 [Citations: 1] -
Follow the Leader: Alternating CPU/GPU Computations in PDES
Marotta, Romolo | Pellegrini, Alessandro | Andelfinger, PhilippProceedings of the 38th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, (2024), P.47
https://doi.org/10.1145/3615979.3656056 [Citations: 1] -
Integration of CPU and GPU to accelerate RSA modular exponentiation operation
Razaque, Abdul | Jinrui, Wang | Zancheng, Wang | Hani, Qassim Bani | Khaskheli, Murad Ali | Bhutto, Waseem Ahmed2018 IEEE Long Island Systems, Applications and Technology Conference (LISAT), (2018), P.1
https://doi.org/10.1109/LISAT.2018.8378036 [Citations: 0] -
Solving Poisson’s equation using FFT in a GPU cluster
Jodra, Jose L. | Gurrutxaga, Ibai | Muguerza, Javier | Yera, AinhoaJournal of Parallel and Distributed Computing, Vol. 102 (2017), Iss. P.28
https://doi.org/10.1016/j.jpdc.2016.09.004 [Citations: 5] -
Inverse characterization of composite materials via surrogate modeling
Steuben, John | Michopoulos, John | Iliopoulos, Athanasios | Turner, CameronComposite Structures, Vol. 132 (2015), Iss. P.694
https://doi.org/10.1016/j.compstruct.2015.05.029 [Citations: 18] -
A GPU-based parallel Object kinetic Monte Carlo algorithm for the evolution of defects in irradiated materials
Jiménez, F. | Ortiz, C.J.Computational Materials Science, Vol. 113 (2016), Iss. P.178
https://doi.org/10.1016/j.commatsci.2015.11.011 [Citations: 22] -
Heterogeneous parallel computing accelerated iterative subpixel digital image correlation
Huang, JianWen | Zhang, LingQi | Jiang, ZhenYu | Dong, ShouBin | Chen, Wei | Liu, YiPing | Liu, ZeJia | Zhou, LiCheng | Tang, LiQunScience China Technological Sciences, Vol. 61 (2018), Iss. 1 P.74
https://doi.org/10.1007/s11431-017-9168-0 [Citations: 24] -
Emerging Research Surrounding Power Consumption and Performance Issues in Utility Computing
History and Evolution of GPU Architecture
Das, Prashanta Kumar | Deka, Ganesh Chandra2016
https://doi.org/10.4018/978-1-4666-8853-7.ch006 [Citations: 1] -
Intelligent Scene Modeling and Human-Computer Interaction
Model Reconstruction of Real-World 3D Objects: An Application with Microsoft HoloLens
Jung, Younhyun | Wu, Yuhao | Jung, Hoijoon | Kim, Jinman2021
https://doi.org/10.1007/978-3-030-71002-6_6 [Citations: 0] -
MC64-Cluster: Many-Core CPU Cluster Architecture and Performance Analysis in B-Tree Searches
Esteban, Francisco José | Díaz, David | Hernández, Pilar | Caballero, Juan Antonio | Dorado, Gabriel | Gálvez, SergioThe Computer Journal, Vol. 61 (2018), Iss. 6 P.912
https://doi.org/10.1093/comjnl/bxx114 [Citations: 2] -
Minimization of high computational cost in data preprocessing and modeling using MPI4Py
Oluwasakin, E. | Torku, T. | Tingting, S. | Yinusa, A. | Hamdan, S. | Poudel, S. | Hasan, N. | Vargas, J. | Poudel, K.Machine Learning with Applications, Vol. 13 (2023), Iss. P.100483
https://doi.org/10.1016/j.mlwa.2023.100483 [Citations: 4] -
Massively Parallel Discrete Element Method Simulations on Graphics Processing Units
Steuben, John | Mustoe, Graham | Turner, CameronJournal of Computing and Information Science in Engineering, Vol. 16 (2016), Iss. 3
https://doi.org/10.1115/1.4033724 [Citations: 4] -
GPU based numerical simulation of core shooting process
Zhang, Yi-zhong | Lu, Gao-chun | Ni, Chang-jiang | Jing, Tao | Yang, Lin-long | Wu, Qin-fangChina Foundry, Vol. 14 (2017), Iss. 5 P.392
https://doi.org/10.1007/s41230-017-7172-1 [Citations: 2] -
PCIe-based FPGA-GPU heterogeneous computation for real-time multi-emitter fitting in super-resolution localization microscopy
Gui, Dan | Chen, Yunjiu | Kuang, Weibing | Shang, Mingtao | Zhang, Yingjun | Huang, Zhen-LiBiomedical Optics Express, Vol. 13 (2022), Iss. 6 P.3401
https://doi.org/10.1364/BOE.459198 [Citations: 2] -
GPU Maps for the Space of Computation in Triangular Domain Problems
Navarro, Cristobal A. | Hitschfeld, Nancy2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), (2014), P.375
https://doi.org/10.1109/HPCC.2014.64 [Citations: 9] -
Efficient GPU thread mapping on embedded 2D fractals
Navarro, Cristobál A. | Quezada, Felipe A. | Hitschfeld, Nancy | Vega, Raimundo | Bustos, BenjaminFuture Generation Computer Systems, Vol. 113 (2020), Iss. P.158
https://doi.org/10.1016/j.future.2020.07.006 [Citations: 6] -
An Evaluation of Directive-Based Parallelization on the GPU Using a Parboil Benchmark
Đukić, Jovan | Mišić, MarkoElectronics, Vol. 12 (2023), Iss. 22 P.4555
https://doi.org/10.3390/electronics12224555 [Citations: 2] -
Co-Processing Parallel Computation for Distributed Optical Fiber Vibration Sensing
Wang, Yu | Lv, Yuejuan | Jin, Baoquan | Xu, Yuelin | Chen, Yu | Liu, Xin | Bai, QingApplied Sciences, Vol. 10 (2020), Iss. 5 P.1747
https://doi.org/10.3390/app10051747 [Citations: 4] -
An Empirical Investigation of a Fault Tolerant Containerized Application Deployment
Bisht, Sankalp Singh | Kaur, Parmeet2022 1st International Conference on Informatics (ICI), (2022), P.171
https://doi.org/10.1109/ICI53355.2022.9786896 [Citations: 0] -
Faster search for long gravitational-wave transients: GPU implementation of the transient $ \newcommand{\F}{\mathcal{F}}\boldsymbol{ \F}$ -statistic
Keitel, David | Ashton, GregoryClassical and Quantum Gravity, Vol. 35 (2018), Iss. 20 P.205003
https://doi.org/10.1088/1361-6382/aade34 [Citations: 14] -
BiqBin: A Parallel Branch-and-bound Solver for Binary Quadratic Problems with Linear Constraints
Gusmeroli, Nicolò | Hrga, Timotej | Lužar, Borut | Povh, Janez | Siebenhofer, Melanie | Wiegele, AngelikaACM Transactions on Mathematical Software, Vol. 48 (2022), Iss. 2 P.1
https://doi.org/10.1145/3514039 [Citations: 7] -
Coding Dimensions and the Power of Finite Element, Volume, and Difference Methods
Parallel Computing Techniques
Tayyeh, Alnoman Mundher | Shather, Akram H. | Anaz, Saja Sumiea | Jasim, Firas T.2024
https://doi.org/10.4018/979-8-3693-3964-0.ch006 [Citations: 0] -
Towards a GPU accelerated spatial computing framework
Chavan, Harshada | Alghamdi, Rami | Mokbel, Mohamed F.2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW), (2016), P.135
https://doi.org/10.1109/ICDEW.2016.7495634 [Citations: 3] -
Improving performance of GPU code using novel features of the NVIDIA kepler architecture
Li, Yuanzhe | Schwiebert, Loren | Hailat, Eyad | Mick, Jason | Potoff, JeffreyConcurrency and Computation: Practice and Experience, Vol. 28 (2016), Iss. 13 P.3586
https://doi.org/10.1002/cpe.3744 [Citations: 8] -
Reducing the replication time for structural estimations: A successful replication of “An Anatomy of International Trade” using GPU computing
Zhong, Jiatong
Economic Inquiry, Vol. (2024), Iss.
https://doi.org/10.1111/ecin.13257 [Citations: 0] -
Proceedings of the Future Technologies Conference (FTC) 2023, Volume 3
Parallel Fingerprint Recognition Using Generalized Hough Transform in a Virtual Grid
Zerbo, Ali | Ouedraogo, Moïse | Sere, Abdoulaye2023
https://doi.org/10.1007/978-3-031-47457-6_35 [Citations: 0] -
Turbomachinery GPU Accelerated CFD: An Insight into Performance
Molinero-Hernández, Daniel | Galván-González, Sergio R. | Herrera-Sandoval, Nicolás D. | Guzman-Avalos, Pablo | Pacheco-Ibarra, J. Jesús | Domínguez-Mota, Francisco J.Computation, Vol. 12 (2024), Iss. 3 P.57
https://doi.org/10.3390/computation12030057 [Citations: 0] -
Adaptive multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model
Navarro, Cristóbal A. | Huang, Wei | Deng, YoujinComputer Physics Communications, Vol. 205 (2016), Iss. P.48
https://doi.org/10.1016/j.cpc.2016.04.007 [Citations: 12] -
Acoustic Vibration of a Fluid in a Three-Dimensional Cavity: Finite Element Method Simulation using CUDA and MATLAB
Chango, Juan F. | Navarro, Cristobal A. | Gonzalez-Montenegro, Mario A.2018 37th International Conference of the Chilean Computer Science Society (SCCC), (2018), P.1
https://doi.org/10.1109/SCCC.2018.8705226 [Citations: 0] -
Lattice Monte Carlo simulation of Galilei variant anomalous diffusion
Guo, Gang | Bittig, Arne | Uhrmacher, AdelindeJournal of Computational Physics, Vol. 288 (2015), Iss. P.167
https://doi.org/10.1016/j.jcp.2015.02.017 [Citations: 2] -
Big Data Analytics Using Cloud Computing Based Frameworks for Power Management Systems: Status, Constraints, and Future Recommendations
AL-Jumaili, Ahmed Hadi Ali | Muniyandi, Ravie Chandren | Hasan, Mohammad Kamrul | Paw, Johnny Koh Siaw | Singh, Mandeep JitSensors, Vol. 23 (2023), Iss. 6 P.2952
https://doi.org/10.3390/s23062952 [Citations: 30] -
A GPU-Accelerated Filtered Density Function Simulator of Turbulent Reacting Flows
Inkarbekov, M. | Aitzhan, A. | Kaltayev, A. | Sammak, S.International Journal of Computational Fluid Dynamics, Vol. 34 (2020), Iss. 6 P.381
https://doi.org/10.1080/10618562.2020.1787996 [Citations: 2] -
Fusion of Calling Sites
do Couto Teixeira, Douglas | Collange, Caroline | Pereira, Fernando Magno Quintao2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), (2015), P.90
https://doi.org/10.1109/SBAC-PAD.2015.16 [Citations: 1] -
An efficient infinity norm minimization algorithm for under-determined inverse problems
Rateb, Ahmad M.
Digital Signal Processing, Vol. 156 (2025), Iss. P.104818
https://doi.org/10.1016/j.dsp.2024.104818 [Citations: 0] -
Efficiency analysis of discontinuous Galerkin approaches for the application onto quantum Liouville-type equations
Ganiu, Valmir | Schulz, DirkJournal of Computational Electronics, Vol. 23 (2024), Iss. 4 P.718
https://doi.org/10.1007/s10825-024-02178-1 [Citations: 0] -
Implementation of Floating‐Point Arithmetic Processing on Content Addressable Memory‐Based Massive‐ParallelSIMDmatriX Core
Kageyama, Kyosuke | Arai, Sota | Hamano, Hajime | Kong, Xiangbo | Koide, Tetsushi | Kumaki, TakeshiIEEJ Transactions on Electrical and Electronic Engineering, Vol. 18 (2023), Iss. 4 P.546
https://doi.org/10.1002/tee.23753 [Citations: 3] -
High-Performance and Parallel Computing Techniques Review: Applications, Challenges and Potentials to Support Net-Zero Transition of Future Grids
Al-Shafei, Ahmed | Zareipour, Hamidreza | Cao, YankaiEnergies, Vol. 15 (2022), Iss. 22 P.8668
https://doi.org/10.3390/en15228668 [Citations: 2] -
Gamification-Based E-Learning Strategies for Computer Programming Education
Applying Gamification in a Parallel Programming Course
Fresno, Javier | Ortega-Arranz, Hector | Ortega-Arranz, Alejandro | Gonzalez-Escribano, Arturo | Llanos, Diego R.2017
https://doi.org/10.4018/978-1-5225-1034-5.ch006 [Citations: 1] -
Comparison of Pulse Compression Algorithm Implementations on Various Hardware Platforms
Dróżka, Mateusz
2023 Signal Processing Symposium (SPSympo), (2023), P.44
https://doi.org/10.23919/SPSympo57300.2023.10302685 [Citations: 0] -
ESIREOS: Efficient, Scalable, Internal, Relative Evaluation of Outliers Solutions
Alves, William A. | Marques, Henrique O. | Naldi, Murilo C. | Sander, Jörg2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS), (2023), P.555
https://doi.org/10.1109/ICPADS60453.2023.00088 [Citations: 0] -
Irregular alignment of arbitrarily long DNA sequences on GPU
Perez-Wohlfeil, Esteban | Trelles, Oswaldo | Guil, NicolásThe Journal of Supercomputing, Vol. 79 (2023), Iss. 8 P.8699
https://doi.org/10.1007/s11227-022-05007-z [Citations: 3] -
Supercomputing
Multi GPU Implementation to Accelerate the CFD Simulation of a 3D Turbo-Machinery Benchmark Using the RapidCFD Library
Molinero, Daniel | Galván, Sergio | Pacheco, Jesús | Herrera, Nicolás2019
https://doi.org/10.1007/978-3-030-38043-4_15 [Citations: 0] -
RTX-RSim
Thoman, Peter | Wippler, Markus | Hranitzky, Robert | Fahringer, ThomasProceedings of the International Workshop on OpenCL, (2020), P.1
https://doi.org/10.1145/3388333.3388662 [Citations: 4] -
A Mesh Reduced Method for Speeding Up Structured Grid-Based Water Quantity and Quality Models in Large-Scale River Networks
Kang, Jin | Wang, Yonggui | Xu, Jing | Yang, Shuihua | Hou, HaoboWater, Vol. 11 (2019), Iss. 3 P.437
https://doi.org/10.3390/w11030437 [Citations: 0] -
A computational discussion on brain topodynamics
Henry, Christopher J.
Physics of Life Reviews, Vol. 21 (2017), Iss. P.32
https://doi.org/10.1016/j.plrev.2017.04.007 [Citations: 1] -
A GPU-Oriented Application Programming Interface for Digital Audio Workstations
Bianchi, Daniele | Avanzini, Federico | Barate, Adriano | Ludovico, Luca A. | Presti, GiorgioIEEE Transactions on Parallel and Distributed Systems, Vol. 33 (2022), Iss. 8 P.1924
https://doi.org/10.1109/TPDS.2021.3131659 [Citations: 1] -
Multi‐GPU room response simulation with hardware raytracing
Thoman, Peter | Wippler, Markus | Hranitzky, Robert | Gschwandtner, Philipp | Fahringer, ThomasConcurrency and Computation: Practice and Experience, Vol. 34 (2022), Iss. 4
https://doi.org/10.1002/cpe.6663 [Citations: 2] -
Daisen: A Framework for Visualizing Detailed GPU Execution
Sun, Yifan | Zhang, Yixuan | Mosallaei, Ali | Shah, Michael D. | Dunne, Cody | Kaeli, DavidComputer Graphics Forum, Vol. 40 (2021), Iss. 3 P.239
https://doi.org/10.1111/cgf.14303 [Citations: 3] -
Learning Binary Descriptors for Fingerprint Indexing
Bai, Chaochao | Li, Mingqiang | Zhao, Tong | Wang, WeiqiangIEEE Access, Vol. 6 (2018), Iss. P.1583
https://doi.org/10.1109/ACCESS.2017.2779562 [Citations: 5] -
Improving fuzzy C-mean-based community detection in social networks using dynamic parallelism
Al-Ayyoub, Mahmoud | Al-andoli, Mohammed | Jararweh, Yaser | Smadi, Mohammad | Gupta, BrijComputers & Electrical Engineering, Vol. 74 (2019), Iss. P.533
https://doi.org/10.1016/j.compeleceng.2018.01.003 [Citations: 20] -
A survey on graphic processing unit computing for large‐scale data mining
Cano, Alberto
WIREs Data Mining and Knowledge Discovery, Vol. 8 (2018), Iss. 1
https://doi.org/10.1002/widm.1232 [Citations: 49] -
A novel shadow calculation approach based on multithreaded parallel computing
Zhou, Xin | Shen, Xiaohan | Liu, Zhaoru | Sun, Hongsan | An, Jingjing | Yan, DaEnergy and Buildings, Vol. 312 (2024), Iss. P.114237
https://doi.org/10.1016/j.enbuild.2024.114237 [Citations: 0] -
Accelerating range minimum queries with ray tracing cores
Meneses, Enzo | Navarro, Cristóbal A. | Ferrada, Héctor | Quezada, Felipe A.Future Generation Computer Systems, Vol. 157 (2024), Iss. P.98
https://doi.org/10.1016/j.future.2024.03.040 [Citations: 0] -
GPU-Based Data Processing for 2-D Microwave Imaging on MAST
Chorley, J. C | Akers, R. J | Brunner, K. J | Dipper, N. A | Freethy, S. J | Sharples, R. M | Shevchenko, V. F | Thomas, D. A | Vann, R. G. LFusion Science and Technology, Vol. 69 (2016), Iss. 3 P.643
https://doi.org/10.13182/FST15-188 [Citations: 3] -
Use of GPUs to boost the performance of a lattice-free tumour growth model
Stella, Sabrina | Chignola, Roberto | Milotti, EdoardoJournal of Physics: Conference Series, Vol. 566 (2014), Iss. P.012019
https://doi.org/10.1088/1742-6596/566/1/012019 [Citations: 0] -
GPU parallel simulation algorithm of Brownian particles with excluded volume using Delaunay triangulations
Carter, Francisco | Hitschfeld, Nancy | Navarro, Cristóbal A. | Soto, RodrigoComputer Physics Communications, Vol. 229 (2018), Iss. P.148
https://doi.org/10.1016/j.cpc.2018.04.006 [Citations: 7] -
Algorithms of Machine Learning and Application for Signal Compensation
Peng, Yudong
Highlights in Science, Engineering and Technology, Vol. 70 (2023), Iss. P.571
https://doi.org/10.54097/hset.v70i.13985 [Citations: 0] -
Active and passive cooling techniques of graphical processing units in automotive applications - a review
Praveen, S M | A, RammohanEngineering Research Express, Vol. 6 (2024), Iss. 2 P.022506
https://doi.org/10.1088/2631-8695/ad513b [Citations: 0] -
Efficient microscopy image analysis on CPU-GPU systems with cost-aware irregular data partitioning
Barreiros, Willian | Melo, Alba C.M.A. | Kong, Jun | Ferreira, Renato | Kurc, Tahsin M. | Saltz, Joel H. | Teodoro, GeorgeJournal of Parallel and Distributed Computing, Vol. 164 (2022), Iss. P.40
https://doi.org/10.1016/j.jpdc.2022.02.004 [Citations: 3] -
ShaderNet
Zhao, Lin | Khan, Arijit | Luo, RobbyProceedings of the 5th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), (2022), P.1
https://doi.org/10.1145/3534540.3534688 [Citations: 2] -
A Tool for Translating Sequential Source Code to Parallel Code Written in C++ and OpenACC
Alsubhi, K. | Alsolami, F. | Algarni, A. | Albassam, E. | Khemakhem, M. | Eassa, F. | Jambi, K. | Ashraf, M. Usman2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), (2019), P.1
https://doi.org/10.1109/AICCSA47632.2019.9035292 [Citations: 4] -
Advances in Medical Image Segmentation: A Comprehensive Review of Traditional, Deep Learning and Hybrid Approaches
Xu, Yan | Quan, Rixiang | Xu, Weiting | Huang, Yi | Chen, Xiaolong | Liu, FengyuanBioengineering, Vol. 11 (2024), Iss. 10 P.1034
https://doi.org/10.3390/bioengineering11101034 [Citations: 0] -
Robot Intelligence Technology and Applications 6
Elimination of Race Condition During GPU Acceleration of Probabilistic Height Map
Kwon, Soonpyo | Byun, Juwoong | Park, Hae-Won2022
https://doi.org/10.1007/978-3-030-97672-9_28 [Citations: 0] -
Simultaneous detection for multiple anomaly data in internet of energy based on random forest
Li, Qiang | Zhang, Limei | Zhang, Guanghui | Ouyang, Hanyi | Bai, MukeApplied Soft Computing, Vol. 134 (2023), Iss. P.109993
https://doi.org/10.1016/j.asoc.2023.109993 [Citations: 4] -
Maximal clique enumeration problem on graphs: status and challenges
许, 绍显 | 廖, 小飞 | 邵, 志远 | 华, 强胜 | 金, 海SCIENTIA SINICA Informationis, Vol. 52 (2022), Iss. 5 P.784
https://doi.org/10.1360/SSI-2021-0155 [Citations: 3] -
A high-speed tracking algorithm for dense granular media
Cerda, Mauricio | Navarro, Cristóbal A. | Silva, Juan | Waitukaitis, Scott R. | Mujica, Nicolás | Hitschfeld, NancyComputer Physics Communications, Vol. 227 (2018), Iss. P.8
https://doi.org/10.1016/j.cpc.2018.02.010 [Citations: 10] -
Job Scheduling Strategies for Parallel Processing
Memory-Aware Latency Prediction Model for Concurrent Kernels in Partitionable GPUs: Simulations and Experiments
Masola, Alessio | Capodieci, Nicola | Cavicchioli, Roberto | Olmedo, Ignacio Sanudo | Rouxel, Benjamin2023
https://doi.org/10.1007/978-3-031-43943-8_3 [Citations: 0] -
Rough Sets and Knowledge Technology
A Parallel Matrix-Based Approach for Computing Approximations in Dominance-Based Rough Sets Approach
Li, Shaoyong | Li, Tianrui2014
https://doi.org/10.1007/978-3-319-11740-9_17 [Citations: 4] -
GGArray: A Dynamically Growable GPU Array
Meneses, Enzo | Navarro, Cristobal A. | Ferrada, Hector2022 41st International Conference of the Chilean Computer Science Society (SCCC), (2022), P.1
https://doi.org/10.1109/SCCC57464.2022.10000385 [Citations: 0] -
Analyzing GPU Tensor Core Potential for Fast Reductions
Carrasco, Roberto | Vega, Raimundo | Navarro, Cristobal A.2018 37th International Conference of the Chilean Computer Science Society (SCCC), (2018), P.1
https://doi.org/10.1109/SCCC.2018.8705253 [Citations: 6] -
RRR-Net: Reusing, Reducing, and Recycling a Deep Backbone Network
Sun, Haozhe | Guyon, Isabelle | Mohr, Felix | Tabia, Hedi2023 International Joint Conference on Neural Networks (IJCNN), (2023), P.1
https://doi.org/10.1109/IJCNN54540.2023.10191770 [Citations: 0] -
AAP4All: An Adaptive Auto Parallelization of Serial Code for HPC Systems
Usman Ashraf, M. | Alburaei Eassa, Fathy | J. Osterweil, Leon | Ahmad Albeshri, Aiiad | Algarni, Abdullah | Ilyas, IqraIntelligent Automation & Soft Computing, Vol. 29 (2021), Iss. 3 P.615
https://doi.org/10.32604/iasc.2021.019044 [Citations: 2] -
InSAR Greece with Parallelized Persistent Scatterer Interferometry: A National Ground Motion Service for Big Copernicus Sentinel-1 Data
Papoutsis, Ioannis | Kontoes, Charalampos | Alatza, Stavroula | Apostolakis, Alexis | Loupasakis, ConstantinosRemote Sensing, Vol. 12 (2020), Iss. 19 P.3207
https://doi.org/10.3390/rs12193207 [Citations: 31] -
GPU Implementation of Adaptive Fourier Decomposition
Borowicz, Adam
2019 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), (2019), P.17
https://doi.org/10.23919/SPA.2019.8936752 [Citations: 1] -
Optimized thread-block arrangement in a GPU implementation of a linear solver for atmospheric chemistry mechanisms
Guzman Ruiz, Christian | Acosta, Mario | Jorba, Oriol | Cesar Galobardes, Eduardo | Dawson, Matthew | Oyarzun, Guillermo | Pérez García-Pando, Carlos | Serradell, KimComputer Physics Communications, Vol. 302 (2024), Iss. P.109240
https://doi.org/10.1016/j.cpc.2024.109240 [Citations: 1] -
Hardware Interrupt and CPU Contention aware CPU/GPU Co-Scheduling on Multi-Cluster System
Hwang, Sunjun | Choi, Jin | Yoo, Seohwan | Park, Hayeon | Lee, Chang-Gun2022 5th International Conference on Information and Computer Technologies (ICICT), (2022), P.117
https://doi.org/10.1109/ICICT55905.2022.00028 [Citations: 2] -
A Visual MapReduce Program Development Environment for Heterogeneous Computing on Clouds
Liang, Tyng-Yeu | Yeh, Li-Wei | Wu, Chi-HongProceedings of the 2018 International Conference on Computing and Data Engineering, (2018), P.83
https://doi.org/10.1145/3219788.3219800 [Citations: 0] -
A Fast Algorithm Based on Apriori Algorithms to Explore the Set of Repetitive Items of Large Transaction Data
Ghofrani, Javad | Bozorgmehr, Arezoo | Panah, AmirProceedings of the 2nd International Conference on Compute and Data Analysis, (2018), P.13
https://doi.org/10.1145/3193077.3193089 [Citations: 0] -
A hadoop based platform for natural language processing of web pages and documents
Nesi, Paolo | Pantaleo, Gianni | Sanesi, GianmarcoJournal of Visual Languages & Computing, Vol. 31 (2015), Iss. P.130
https://doi.org/10.1016/j.jvlc.2015.10.017 [Citations: 28] -
Innovative Research and Applications in Next-Generation High Performance Computing
Hardware Transactional Memories
Shahid, Arsalan | Murad, Maryam | Qadri, Muhammad Yasir | Qadri, Nadia N. | Ahmed, Jameel2016
https://doi.org/10.4018/978-1-5225-0287-6.ch003 [Citations: 1] -
CLUS_GPU-BLASTP: accelerated protein sequence alignment using GPU-enabled cluster
Rani, Sita | Gupta, O. P.The Journal of Supercomputing, Vol. 73 (2017), Iss. 10 P.4580
https://doi.org/10.1007/s11227-017-2036-4 [Citations: 8] -
RenderBench: The CPU Rendering Benchmark Suite Based on Microarchitecture-Independent Characteristics
Wang, Peng | Yu, ZhibinElectronics, Vol. 12 (2023), Iss. 19 P.4153
https://doi.org/10.3390/electronics12194153 [Citations: 0] -
A review on light transport algorithms and simulation tools to model daylighting inside buildings
Ayoub, Mohammed
Solar Energy, Vol. 198 (2020), Iss. P.623
https://doi.org/10.1016/j.solener.2020.02.018 [Citations: 45] -
A Study of Memory Consumption and Execution Performance of the cuFFT Library
Jodra, Jose Luis | Gurrutxaga, Ibai | Muguerza, Javier2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), (2015), P.323
https://doi.org/10.1109/3PGCIC.2015.66 [Citations: 6] -
A multi-GPU approach for the exchange Monte Carlo method
Navarro, Cristobal A. | Wei, Huang | Deng, Youjin2015 34th International Conference of the Chilean Computer Science Society (SCCC), (2015), P.1
https://doi.org/10.1109/SCCC.2015.7416571 [Citations: 0] -
Fast kNN query processing over a multi-node GPU environment
Barrientos, Ricardo J. | Riquelme, Javier A. | Hernández-García, Ruber | Navarro, Cristóbal A. | Soto-Silva, WladimirThe Journal of Supercomputing, Vol. 78 (2022), Iss. 2 P.3045
https://doi.org/10.1007/s11227-021-03975-2 [Citations: 8] -
GPU-accelerated rectangular decomposition for sound propagation modeling in 2D
Chango, Juan F. | Navarro, Cristobal A. | Gonzalez-Montenegro, Mario A.2019 38th International Conference of the Chilean Computer Science Society (SCCC), (2019), P.1
https://doi.org/10.1109/SCCC49216.2019.8966434 [Citations: 1] -
Speeding up the patch ordering method for image denoising
Munir, Badre | Hussain, Syed Fawad | Noor, AdnanMultimedia Tools and Applications, Vol. 78 (2019), Iss. 16 P.23639
https://doi.org/10.1007/s11042-019-7708-z [Citations: 1] -
Cooperative modular reinforcement learning for large discrete action space problem
Ming, Fangzhu | Gao, Feng | Liu, Kun | Zhao, ChengmeiNeural Networks, Vol. 161 (2023), Iss. P.281
https://doi.org/10.1016/j.neunet.2023.01.046 [Citations: 9] -
Using GPUs to speed-up FCM-based community detection in Social Networks
Alandoli, Mohammed | Shehab, Mohammed | Al-Ayyoub, Mahmoud | Jararweh, Yaser | Al-Smadi, Mohammad2016 7th International Conference on Computer Science and Information Technology (CSIT), (2016), P.1
https://doi.org/10.1109/CSIT.2016.7549467 [Citations: 14] -
Analysis of a Self-Similar GPU Thread Map for Data-parallel m-Simplex Domains
Navarro, Cristobal A. | Bustos, Benjamin | Hitschfeld, Nancy2019 International Conference on High Performance Computing & Simulation (HPCS), (2019), P.1002
https://doi.org/10.1109/HPCS48598.2019.9188081 [Citations: 0] -
Performance Analysis of Multi-GPU Implementations of Krylov-Subspace Methods Applied to FEA of Electromagnetic Phenomena
Peixoto de Camargos, Ana Flavia | Silva, Viviane CristineIEEE Transactions on Magnetics, Vol. 51 (2015), Iss. 3 P.1
https://doi.org/10.1109/TMAG.2014.2363047 [Citations: 3] -
Analyzing the limitations of parallelism in hardware and software through threaded programming
Wang, Chenyi
Highlights in Science, Engineering and Technology, Vol. 41 (2023), Iss. P.23
https://doi.org/10.54097/hset.v41i.6738 [Citations: 0] -
pyC 2 Ray: A flexible and GPU-accelerated radiative transfer framework for simulating the cosmic epoch of reionization
Hirling, P. | Bianco, M. | Giri, S.K. | Iliev, I.T. | Mellema, G. | Kneib, J.-P.Astronomy and Computing, Vol. 48 (2024), Iss. P.100861
https://doi.org/10.1016/j.ascom.2024.100861 [Citations: 1] -
Block-Space GPU Mapping for Embedded Sierpiński Gasket Fractals
Navarro, Cristobal A. | Vega, Raimundo | Bustos, Benjamin | Hitschfeld, Nancy2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), (2017), P.427
https://doi.org/10.1109/HPCC-SmartCity-DSS.2017.56 [Citations: 0] -
Parallel family trees for transfer matrices in the Potts model
Navarro, Cristobal A. | Canfora, Fabrizio | Hitschfeld, Nancy | Navarro, GonzaloComputer Physics Communications, Vol. 187 (2015), Iss. P.55
https://doi.org/10.1016/j.cpc.2014.10.011 [Citations: 2] -
Adaptive kinetic-fluid solvers for heterogeneous computing architectures
Zabelok, Sergey | Arslanbekov, Robert | Kolobov, VladimirJournal of Computational Physics, Vol. 303 (2015), Iss. P.455
https://doi.org/10.1016/j.jcp.2015.10.003 [Citations: 28] -
A concept for data-driven computational mechanics in the presence of polymorphic uncertain properties
Zschocke, Selina | Leichsenring, Ferenc | Graf, Wolfgang | Kaliske, MichaelEngineering Structures, Vol. 267 (2022), Iss. P.114672
https://doi.org/10.1016/j.engstruct.2022.114672 [Citations: 11] -
Computation Augmentation Techniques for Computing Continuum
Grosu, George-Mircea | Nistor, Silvia-Elena | Ciobanu, Radu-Ioan | Kolodziej, Joanna | Pop, Florin2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW), (2024), P.136
https://doi.org/10.1109/CCGridW63211.2024.00023 [Citations: 0] -
Asynchronous Processing for Latent Fingerprint Identification on Heterogeneous CPU-GPU Systems
Sanchez-Fernandez, Andres J. | Romero, Luis F. | Peralta, Daniel | Medina-Perez, Miguel Angel | Saeys, Yvan | Herrera, Francisco | Tabik, SihamIEEE Access, Vol. 8 (2020), Iss. P.124236
https://doi.org/10.1109/ACCESS.2020.3005476 [Citations: 10] -
Parallel Optimization for Large Scale Interferometric Synthetic Aperture Radar Data Processing
Zhang, Weikang | You, Haihang | Wang, Chao | Zhang, Hong | Tang, YixianRemote Sensing, Vol. 15 (2023), Iss. 7 P.1850
https://doi.org/10.3390/rs15071850 [Citations: 4] -
Emerging Research Surrounding Power Consumption and Performance Issues in Utility Computing
GPU Computation and Platforms
Bhargavi, K. | Babu B., Sathish2016
https://doi.org/10.4018/978-1-4666-8853-7.ch007 [Citations: 1] -
Methodology of building intelligent systems on parallel processor
Seitkulov, Yerzhan | Tokhtabayev, Amur | Atanov, Sabyrzhan | Verenik, Nikolai L. | Girel, Alexey I. | Tatur, Mikhail M.2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), (2014), P.1
https://doi.org/10.1109/ICAICT.2014.7035973 [Citations: 0] -
Intensification of research work using images processing by application of parallel filtering on multi-core architectures
Bosakova-Ardenska, Atanaska | Andreeva, HristinaINTERNATIONAL CONFERENCE ON ENVIRONMENTAL, MINING, AND SUSTAINABLE DEVELOPMENT 2022, (2024), P.030007
https://doi.org/10.1063/5.0195739 [Citations: 0] -
Gamification in Education
Applying Gamification in a Parallel Programming Course
Fresno, Javier | Ortega-Arranz, Hector | Ortega-Arranz, Alejandro | Gonzalez-Escribano, Arturo | Llanos, Diego R.2018
https://doi.org/10.4018/978-1-5225-5198-0.ch015 [Citations: 0] -
A GPU-accelerated Monte Carlo code, RT2 for coupled transport of photon, electron/positron, and neutron
Lee, Chang-Min | Ye, Sung-JoonPhysics in Medicine & Biology, Vol. 69 (2024), Iss. 17 P.175005
https://doi.org/10.1088/1361-6560/ad694f [Citations: 0] -
Scalable CAIM discretization on multiple GPUs using concurrent kernels
Cano, Alberto | Ventura, Sebastián | Cios, Krzysztof J.The Journal of Supercomputing, Vol. 69 (2014), Iss. 1 P.273
https://doi.org/10.1007/s11227-014-1151-8 [Citations: 7] -
Utilization of OpenCL for Large Graph Problems on Graphics Processing Unit
Mishra, Vinod Kumar | Sammal, Pankaj SinghElectronic Notes in Discrete Mathematics, Vol. 63 (2017), Iss. P.125
https://doi.org/10.1016/j.endm.2017.11.007 [Citations: 0] -
Replacement policies for a parallel system with shortage and excess costs
Zhao, Xufeng | Chen, Mingchih | Nakagawa, ToshioReliability Engineering & System Safety, Vol. 150 (2016), Iss. P.89
https://doi.org/10.1016/j.ress.2016.01.008 [Citations: 16] -
mcRPL: a general purpose parallel raster processing library on distributed heterogeneous architectures
Gao, Huan | Peng, Xuantong | Guan, Qingfeng | Wang, Jingyi | Liu, Ziqi | Yang, Xue | Zeng, WenInternational Journal of Geographical Information Science, Vol. 37 (2023), Iss. 9 P.2043
https://doi.org/10.1080/13658816.2023.2244550 [Citations: 2] -
Coding Dimensions and the Power of Finite Element, Volume, and Difference Methods
Utilizing Graphics Processing Units (GPUs) for Numerical Computations
Tayyeh, Alnoman Mundher | Shather, Akram H. | Hussein, Husam Abdulhameed | Abdalbaqi, Luma Saad2024
https://doi.org/10.4018/979-8-3693-3964-0.ch011 [Citations: 0] -
Deep learning-based community detection in complex networks with network partitioning and reduction of trainable parameters
Al-Andoli, Mohammed | Cheah, Wooi Ping | Tan, Shing ChiangJournal of Ambient Intelligence and Humanized Computing, Vol. 12 (2021), Iss. 2 P.2527
https://doi.org/10.1007/s12652-020-02389-x [Citations: 23] -
Using Dynamic Parallelism to Speed Up Clustering-Based Community Detection in Social Networks
Alandoli, Mohammed | Al-Ayyoub, Mahmoud | Al-Smadi, Mohammad | Jararweh, Yaser | Benkhelifa, Elhadj2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), (2016), P.240
https://doi.org/10.1109/W-FiCloud.2016.57 [Citations: 12] -
Cognitive information processing based on a parallel processor
Verenik, Nikolai L. | Girel, Alexey I. | Seitkulov, Yerzhan N. | Tatur, Mikhail M. | Razhkova, Hanna P.The 10th International Conference on Digital Technologies 2014, (2014), P.356
https://doi.org/10.1109/DT.2014.6868739 [Citations: 1] -
A Novel Method for Aircraft Structural Dynamic Strain Trend Signal Processing via Optimized Parallel Computing
Tian, Yongwei | Zhang, Fang | Jiang, Jinhui | Fan, ZheApplied Sciences, Vol. 14 (2024), Iss. 19 P.8892
https://doi.org/10.3390/app14198892 [Citations: 0] -
SODECL
Avramidis, Eleftherios | Lalik, Marta | Akman, Ozgur E.ACM Transactions on Mathematical Software, Vol. 46 (2020), Iss. 3 P.1
https://doi.org/10.1145/3385076 [Citations: 1] -
Utilization of parallel computing in chemical engineering
Danko, Matej | Labovský, Juraj | Janošovský, Ján | Labovská, Zuzana | Jelemenský, ĽudovítActa Chimica Slovaca, Vol. 8 (2015), Iss. 2 P.146
https://doi.org/10.1515/acs-2015-0025 [Citations: 4] -
Parallel Algorithms of Well-Balanced and Weighted Average Flux for Shallow Water Model Using CUDA
Sataporn, Nugool | Suwannik, Worasait | Maleewong, Montri | Murillo, JavierModelling and Simulation in Engineering, Vol. 2021 (2021), Iss. P.1
https://doi.org/10.1155/2021/9534495 [Citations: 0] -
Time-dependent QED approach to x-ray nonlinear Compton scattering
Krebs, Dietrich | Reis, David A. | Santra, RobinPhysical Review A, Vol. 99 (2019), Iss. 2
https://doi.org/10.1103/PhysRevA.99.022120 [Citations: 17] -
Competitiveness of a Non-Linear Block-Space GPU Thread Map for Simplex Domains
Navarro, Cristobal A. | Vernier, Matthieu | Bustos, Benjamin | Hitschfeld, NancyIEEE Transactions on Parallel and Distributed Systems, Vol. 29 (2018), Iss. 12 P.2728
https://doi.org/10.1109/TPDS.2018.2849705 [Citations: 6] -
DiffMat: Data-driven inverse design of energy-absorbing metamaterials using diffusion model
Wang, Haoyu | Du, Zongliang | Feng, Fuyong | Kang, Zhong | Tang, Shan | Guo, XuComputer Methods in Applied Mechanics and Engineering, Vol. 432 (2024), Iss. P.117440
https://doi.org/10.1016/j.cma.2024.117440 [Citations: 0] -
Spectrum Occupancy Measurements: A Survey and Use of Interference Maps
Hoyhtya, Marko | Mammela, Aarne | Eskola, Marina | Matinmikko, Marja | Kalliovaara, Juha | Ojaniemi, Jaakko | Suutala, Jaakko | Ekman, Reijo | Bacchus, Roger | Roberson, DennisIEEE Communications Surveys & Tutorials, Vol. 18 (2016), Iss. 4 P.2386
https://doi.org/10.1109/COMST.2016.2559525 [Citations: 157] -
Modeling GPU Dynamic Parallelism for self similar density workloads
Quezada, Felipe A. | Navarro, Cristóbal A. | Romero, Miguel | Aguilera, CristhianFuture Generation Computer Systems, Vol. 145 (2023), Iss. P.239
https://doi.org/10.1016/j.future.2023.03.046 [Citations: 1] -
Age replacement models: A summary with new perspectives and methods
Zhao, Xufeng | Al-Khalifa, Khalifa N. | Magid Hamouda, Abdel | Nakagawa, ToshioReliability Engineering & System Safety, Vol. 161 (2017), Iss. P.95
https://doi.org/10.1016/j.ress.2017.01.011 [Citations: 99] -
Rational Jacobi Kernel Functions: A novel massively parallelizable orthogonal kernel for support vector machines
Moghaddam, Mahdi Movahedian | Aghaei, Alireza Afzal | Parand, Kourosh2024 Third International Conference on Distributed Computing and High Performance Computing (DCHPC), (2024), P.1
https://doi.org/10.1109/DCHPC60845.2024.10454075 [Citations: 1] -
Empirical investigation: performance and power‐consumption based dual‐level model for exascale computing systems
Ashraf, Muhammad Usman | Eassa, Fathy Alboraei | Ahmad, Aiiad | Algarni, AbdullahIET Software, Vol. 14 (2020), Iss. 4 P.319
https://doi.org/10.1049/iet-sen.2018.5062 [Citations: 5] -
On GPU Connected Components and Properties: A Systematic Evaluation of Connected Component Labeling Algorithms and Their Extension for Property Extraction
Asad, Pedro | Marroquim, Ricardo | Souza, Andrea L. e L.IEEE Transactions on Image Processing, Vol. 28 (2019), Iss. 1 P.17
https://doi.org/10.1109/TIP.2018.2851445 [Citations: 6]