Optimizing Thermal Lattice Boltzmann Method on an MT-3000 Processor with Neo-Heterogeneous Strategies

Authors

DOI:

https://doi.org/10.4208/aamm.OA-2025-0046

Keywords:

Lattice Boltzmann method, thermal incompressible flow, heterogeneous processor, parallel algorithm

Abstract

The Lattice Boltzmann Method (LBM) is a computational fluid dynamics method for simulating fluid flows with the benefits of locality and simplicity, making it ideal for parallel computing and complex flow simulations. This study focuses on developing a specialized Double Distribution Function (DDF) LBM software framework optimized for the MT-3000, a novel heterogeneous processor, to facilitate thermal incompressible flow simulations. To improve LBM’s performance on the complex multi-zone architecture of MT-3000, this paper introduces several innovative strategies. Firstly, a temporal fusion optimization strategy is implemented. This strategy involves postponing the temperature field calculations during time steps, efficiently decreasing the time overhead. Furthermore, we present “Pencil-H”, a novel pipelined algorithm meticulously designed to harness the unique capabilities of the MT-3000, thereby enhancing computational efficiency and communication effectiveness. Additionally, an architecture-aware multi-level parallelization algorithm is proposed, tailored to maximize the computational capabilities of the MT-3000. The effectiveness of these optimization strategies has been thoroughly validated through extensive bench-marking tests. These validations have shown remarkable performance enhancements, including a significant acceleration factor of 32.02X when compared to using 16 CPU cores. Notably, the optimized code demonstrated high-fidelity simulation capabilities for thermal incompressible flows, achieving 61.61% of the theoretical maximum performance defined by the roofline model.

Author Biographies

  • Qingyang Zhang

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China

    Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China

  • Lei Xu

    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China

    Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, Guangdong 518106, China

  • Rongliang Chen

    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China

    Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, Guangdong 518106, China

  • Hang Zou

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China

    Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China

  • Bo Yang

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China

    Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China

  • Jingzhi Li

    Department of Mathematics & National Center for Applied Mathematics Shenzhen & SUSTech International Center for Mathematics, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China

  • Jie Liu

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, Hunan 410073, China

    Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China

Published

2025-11-22

Abstract View

  • 160

Pdf View

  • 2

Issue

Section

Articles