Research Interest

The research topics that our lab is currently focusing on are mainly (1) heterogeneous distributed deep learning on a cluster, (2) on-device deep learning using mobile GPUs and NPUs, (3) GPU frameworks for deep learning, big data processing, (4) design methodology for heterogeneous parallel platforms, and (5) algorithm/architecture co-design of ML/DL applications.


Distributed Deep Learning 

  • Automated search of optimal 3D parallelization for LLM training on a heterogenous GPU cluster; experiments across 64 GPUs
  • Efficient inference for LLM
  • Deep learning with heterogenous XPUs: NVIDIA GPUs, AMD GPUs, and FPGAs
  • Funded by ETRI, PI, 2023 – 2026, and by KEITI, Co-PI, 2021-2026

NAS for CNN Accelerator (NPU) 

  • Hardware Architecture Search (HAS) framework for FPGA-based CNN accelerators. With OpenCL-based HLS (High-Level Synthesis) and our sparsity-aware design space exploration framework, we aim to find (near-)optimal NPU design for various dataflow and mappings; funded by Korea Research Foundation (KRF), PI, 2022-2025
      • Alveo U200, Xilinx Vivado HLS v2020.1,  INT8 quantized VGG16 and ResNet50

                       Our_SA    NPU

On-device/On-sensor Deep Learning (GPU)

GPU Frameworks

ML/DL Applications 

HW/SW Co-design Methodology, Performance Estimation