Research Interest

The research topics that our lab is currently focusing on are mainly (1) on-device deep learning using mobile GPUs and NPUs, (2) heterogeneous distributed deep learning on a cluster, (3) GPU frameworks for deep learning, big data processing, (4) design methodology for heterogeneous parallel platforms, and (5) algorithm/architecture co-design of ML/DL applications.

On-device LLM (mobile GPU)

Distributed Deep Learning (GPUs)

  • Automated search of optimal 3D parallelization for LLM training on a heterogenous GPU cluster; experiments across 64 GPUs
  • Efficient inference for LLM
  • Deep learning with heterogenous XPUs: NVIDIA GPUs, AMD GPUs, and FPGAs
  • Funded by KEITI, Co-PI, 2021-2026, and by NRF, Co-PI, 2022-2023, ETRI, PI, 2023
    • FASOP: Fast yet Accurate Automated Search for Optimal Parallelization of Transformers on Heterogeneous GPU Clusters,” to appear in the proceedings of International Symposium on High-Performance Parallel and Distributed Computing(HPDC), Jun. 2024

NAS for CNN Accelerator (NPU) 

On-device/On-sensor CNN (mobile GPU, MCU)

GPU Frameworks

ML/DL Applications 

HW/SW Co-design Methodology, Performance Estimation