Optimizing Thermal Lattice Boltzmann Method on an MT-3000 Processor with Neo-Heterogeneous Strategies

Author(s)

,
,
,
,
,
&

Abstract

The Lattice Boltzmann Method (LBM) is a computational fluid dynamics method for simulating fluid flows with the benefits of locality and simplicity, making it ideal for parallel computing and complex flow simulations. This study focuses on developing a specialized Double Distribution Function (DDF) LBM software framework optimized for the MT-3000, a novel heterogeneous processor, to facilitate thermal incompressible flow simulations. To improve LBM’s performance on the complex multi-zone architecture of MT-3000, this paper introduces several innovative strategies. Firstly, a temporal fusion optimization strategy is implemented. This strategy involves postponing the temperature field calculations during time steps, efficiently decreasing the time overhead. Furthermore, we present “Pencil-H”, a novel pipelined algorithm meticulously designed to harness the unique capabilities of the MT-3000, thereby enhancing computational efficiency and communication effectiveness. Additionally, an architecture-aware multi-level parallelization algorithm is proposed, tailored to maximize the computational capabilities of the MT-3000. The effectiveness of these optimization strategies has been thoroughly validated through extensive bench-marking tests. These validations have shown remarkable performance enhancements, including a significant acceleration factor of 32.02X when compared to using 16 CPU cores. Notably, the optimized code demonstrated high-fidelity simulation capabilities for thermal incompressible flows, achieving 61.61% of the theoretical maximum performance defined by the roofline model.

Author Biographies

  • Qingyang Zhang

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China

    Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China

  • Lei Xu

    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China

    Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, Guangdong 518106, China

  • Rongliang Chen

    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China

    Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, Guangdong 518106, China

  • Hang Zou

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China

    Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China

  • Bo Yang

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China

    Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China

  • Jingzhi Li

    Department of Mathematics & National Center for Applied Mathematics Shenzhen & SUSTech International Center for Mathematics, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China

  • Jie Liu

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China

    Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China

About this article

Abstract View

  • 656

Pdf View

  • 9

DOI

10.4208/aamm.OA-2025-0046