Research

wordcloud_20210807131047

Research Interest

The research topics that our lab is currently focusing on are mainly (1) on-device deep learning using mobile GPUs and NPUs, (2) heterogeneous distributed deep learning on a cluster, (3) GPU frameworks for deep learning, big data processing, (4) design methodology for heterogeneous parallel platforms, and (5) algorithm/architecture co-design of ML/DL applications.

On-device LLM Inference

  • LLM inference on a resource-constrained mobile GPU (like Jetson GPU)
  • LLM Inference acceleration by exploiting activation sparsity
    • Grasp: Group-based Prediction of Activation Sparsity for Fast LLM Inference,”  Proceedings of Design Automation Conference (DAC), Jun. 2025
    • SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference,”  Proceedings of International Conference on Design Automation and Test in Europe (DATE), Mar. 2025
 Model: Llama 13B,  SW: llama.cpp,  HW: Jetson ORIN AGX 64GB; (Left) llama.cpp (original) vs. (Right) SparseInfer
  •  
  • LLM fine-tuning
 Model: SOLAR 10.7B,  SW: Transformers (on PyTorch),  HW: Jetson ORIN AGX 64GB; No RAG is used; it is fine-tuned on-device
  •  

Distributed Heterogeneous Deep Learning 

  • Automated search of optimal 3D parallelization for LLM training on a heterogeneous GPU cluster; experiments across 64 GPUs
  • Efficient inference for LLM
  • Deep learning with heterogeneous xPUs: NVIDIA GPUs, AMD GPUs, FPGAs, and PIMs
    • Funded by IITP(PI, 2025-2028, K클라우드: AI 이종 통합자원관리 기술), ETRI(PI, 2023-2025, 이기종 가속기 분산딥러닝 최적화), KEITI (Co-PI, 2021-2025, 독성예측플랫폼), and by NRF(Co-PI, 2022-2023, 슈퍼컴퓨터선도개발사업)
    • FASOP: Fast yet Accurate Automated Search for Optimal Parallelization of Transformers on Heterogeneous GPU Clusters,” in the proceedings of International Symposium on High-Performance Parallel and Distributed Computing(HPDC), Jun. 2024
      •  

       

On-device ViT (mobile GPU, NPU)

  • NAS (Neural Architecture Search) for robot vision task on mobile GPU such as Jeton GPU and NPU
    • Funded by ETRI(PI, 2025 – 2026, 이기종 HW고려 로봇 비젼 신경망 자동탐색기술)

On-device/On-sensor CNN (mobile GPU, MCU)

NAS for CNN Accelerator (NPU) 

GPU Frameworks

ML/DL Applications 

HW/SW Co-design Methodology, Performance Estimation