a Computer System and Architecture Reading Group @ KAIST

- Contact us: root [_at_]

All slides attached in this homepage are only accessible in the KAIST network.

Date Presenter Title Venue Slide
04-28 Seonjin Na Accelerating Graph Sampling for Graph Machine Learning using GPUs EuroSys'21 PDF
03-31 Igjae Kim HERTI: a Reinforcement Learning-Augmented System for Efficient Real-Time Inference on Heterogeneous Embedded Systems PACT'21 PDF
03-17 Jungwoo Kim ZeRO-Offload: Democratizing Billion-Scale Model Training ATC'21 PDF
03-10 Sanghyeon Lee VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services via Adaptive Compilation and Scheduling ASPLOS'22 PDF
02-23 Soojin Hwang AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming ASPLOS'20 PDF
02-09 Sunho Lee Refurbish Your Training Data: Reusing Partially Augmented Samples for Faster Deep Neural Network Training ATC'21 PDF
01-26 Seonjin Na INFaaS: Automated Model-less Inference Serving ATC'21 PDF
01-19 Jungwoo Kim Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads OSDI'21 PDF
01-13 Sanghyeon Lee Ten Lessons From Three Generations Shaped Google's TPUv4i ISCA'21 PDF
Date Presenter Title Venue Slide
12-29 Soojin Hwang AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated Learning MICRO'21 PDF
12-15 Sunho Lee Reverse Engineering Convolutional Neural Networks Through Side-channel Information Leaks DAC'18 PDF
12-01 Seonjin Na REDUCT: Keep it Close, Keep it Cool! ISCA'21 PDF
11-09 Jungwoo Kim Equinox: Training (for Free) on a Custom Inference Accelerator MICRO'21 PDF
10-26 Yeonjae Kim Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning OSDI'21 PDF
09-15 Sanghyeon Lee AN IMAGE IS WORTH 16X16 WORDS: Transformers for Image Recognition at scale ICLR'21 PDF
09-08 Soojin Hwang PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections OSDI'21 PDF
08-17 Sunho Lee RaPiD: AI Accelerator for Ultra-low Precision Training and Inference ISCA'21 PDF
07-27 Jungwoo Kim Shredder: Learning Noise Distributions to Protect Inference Privacy ASPLOS'20 PDF
07-13 Yeonjae Kim Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training HPCA'21 PDF
06-22 Sanghyeon Lee Heterogeneous Dataflow Accelerator for Multi-DNN Workloads HPCA'21 PDF
06-08 Soojin Hwang Procrustes: A dataflow and accelerator for sparse deep neural network training MICRO'20 PDF
06-01 Sunho Lee An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC ISSCC'19 PDF
05-25 Seonjin Na Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores MICRO'20 PDF
05-18 Seunghyo Kang Fast-BCNN: Massive Neuron Skipping in Bayesian Convolutional Neural Networks MICRO'20 PDF
04-20 Yeonjae Kim ALERT: Accurate Learning for Energy and Timeliness ATC'20 PDF
04-06 Sanghyeon Lee Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling HPCA'21 PDF
03-30 Soojin Hwang SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning HPCA'21 PDF
03-16 Sunho Lee LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference HPCA'21 PDF
03-02 Seonjin Na Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training ATC'20 PDF
02-23 Seunghyo Kang SuperNPU: An Extremely Fast Neural Processing Unit Using Superconducting Logic Devices MICRO'20 PDF
02-09 Yeonjae Kim Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads OSDI'20 PDF
02-02 Soojin Hwang SparseTrain: Leveraging Dynamic Sparsity in Software for Training DNNs on General-Purpose SIMD Processors PACT'20 PDF
01-26 Sanghyeon Lee KungFu: Making Training in Distributed Machine Learning Adaptive OSDI'20 PDF
01-19 Sunho Lee Cheetah: optimizing and accelerating homomorphic encryption for private inference HPCA'21 PDF
01-07 Seonjin Na A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters OSDI'20 PDF
Date Presenter Title Venue Slide
12-15 Seunghyo Kang Ansor: Generating High-Performance Tensor Programs for Deep Learning OSDI'20 PDF
12-01 Yeonjae Kim Serving DNNs like Clockwork: Performance Predictability from the Bottom Up OSDI'20 PDF
11-03 Soojin Hwang MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product MICRO'20 PDF
10-27 Sanghyeon Lee DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture MICRO'20 PDF
10-20 Sunho Lee TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference MICRO'20 PDF
09-28 Seonjin Na ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning MICRO'20 PDF
09-22 Seunghyo Kang PERCIVAL: Making In-Browser Perceptual Ad Blocking Practical with Deep Learning ATC'20 PDF
09-02 Yeonjae Kim DeepSniffer: a DNN Model Extraction Framework based on Learning Architectural Hints ASPLOS'20 PDF
08-11 Soojin Hwang Gorgon: Accelerating Machine Learning from Relational Data ISCA'20 PDF
07-28 Sanghyeon Lee A Simple Framework for Contrastive Learning of Visual Representations ICML'20 PDF
07-09 Sunho Lee A deep reinforcement learning framework for architectural exploration: A routerless NoC case study HPCA'20 PDF
06-23 Seunghyo Kang DRQ: Dynamic Region-based Quantization for Deep Neural Network Acceleration ISCA'20 PDF
06-09 Sanghyeon Lee A Multi-Neural Network Acceleration Architecture ISCA'20 PDF
06-02 Yeonjae Kim PipeDream: generalized pipeline parallelism for DNN training SOSP'19 PDF
05-26 Soojin Hwang A3: Accelerating Attention Mechanisms in Neural Networks with Approximation HPCA'20 PDF
05-12 Sunho Lee Nexus: A GPU Cluster Engine for Accelerating DNN-Based Video Analysis SOSP'19 PDF
04-28 Seonjin Na EDEN: Evolutionary deep networks for efficient machine learning MICRO'19 PDF
03-31 Seunghyo Kang Accelerating Distributed Reinforcement Learning with In-Switch Computing ISCA'19 PDF
03-12 Soojin Hwang SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training MICRO'19 PDF
03-03 Sunho Lee SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training HPCA'20 PDF
02-18 Seonjin Na Shortcut Mining: Exploiting Cross-Layer Shortcut Reuse in CNN Accelerators HPCA'19 PDF
02-13 Seunghyo Kang Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler ASPLOS'18 PDF
02-06 Soojin Hwang Eager Pruning: Algorithm and Architecture Support for Fast Training of Deep Neural Networks ISCA'19 PDF
01-21 Sunho Lee GRNN: Low-Latency and Scalable RNN Inference on GPUs EUROSYS'19 PPTX
01-09 Sukchul Cho Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks ISCA'18 PDF
Date Presenter Title Venue Slide
12-23 Seonjin Na PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units HPCA'20 PDF
12-16 Seunghyo Kang The Dark Side of DNN Pruning ISCA'18 PDF
11-25 Soojin Hwang NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units ASPLOS'20 PDF
11-18 Sunho Lee Prediction based Execution on Deep Neural Networks ISCA'18 PDF
11-11 Sukchul Cho SCALE-Sim: Systolic CNN Accelerator Simulator arXiv'18 PDF