Abstract
The Lattice Boltzmann Method (LBM) is a computational fluid dynamics method for simulating fluid flows with the benefits of locality and simplicity, making it ideal for parallel computing and complex flow simulations. This study focuses on developing a specialized Double Distribution Function (DDF) LBM software framework optimized for the MT-3000, a novel heterogeneous processor, to facilitate thermal incompressible flow simulations. To improve LBM’s performance on the complex multi-zone architecture of MT-3000, this paper introduces several innovative strategies. Firstly, a temporal fusion optimization strategy is implemented. This strategy involves postponing the temperature field calculations during time steps, efficiently decreasing the time overhead. Furthermore, we present “Pencil-H”, a novel pipelined algorithm meticulously designed to harness the unique capabilities of the MT-3000, thereby enhancing computational efficiency and communication effectiveness. Additionally, an architecture-aware multi-level parallelization algorithm is proposed, tailored to maximize the computational capabilities of the MT-3000. The effectiveness of these optimization strategies has been thoroughly validated through extensive bench-marking tests. These validations have shown remarkable performance enhancements, including a significant acceleration factor of 32.02X when compared to using 16 CPU cores. Notably, the optimized code demonstrated high-fidelity simulation capabilities for thermal incompressible flows, achieving 61.61% of the theoretical maximum performance defined by the roofline model.
Author Biographies
-
Qingyang Zhang
Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China
Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China
-
Lei Xu
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, Guangdong 518106, China
-
Rongliang Chen
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China
Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, Guangdong 518106, China
-
Hang Zou
Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China
Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China
-
Bo Yang
Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China
Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China
-
Jingzhi Li
Department of Mathematics & National Center for Applied Mathematics Shenzhen & SUSTech International Center for Mathematics, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
-
Jie Liu
Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China
Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China