Date
| Presenter
| Title
| Venue
| Slide
|
04-28 |
Seonjin Na |
Accelerating Graph Sampling for Graph Machine Learning using GPUs |
EuroSys'21 |
PDF |
03-31 |
Igjae Kim |
HERTI: a Reinforcement Learning-Augmented System for Efficient Real-Time Inference on Heterogeneous Embedded Systems |
PACT'21 |
PDF |
03-17 |
Jungwoo Kim |
ZeRO-Offload: Democratizing Billion-Scale Model Training |
ATC'21 |
PDF |
03-10 |
Sanghyeon Lee |
VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services via Adaptive Compilation and Scheduling |
ASPLOS'22 |
PDF |
02-23 |
Soojin Hwang |
AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming |
ASPLOS'20 |
PDF |
02-09 |
Sunho Lee |
Refurbish Your Training Data: Reusing Partially Augmented Samples for Faster Deep Neural Network Training |
ATC'21 |
PDF |
01-26 |
Seonjin Na |
INFaaS: Automated Model-less Inference Serving |
ATC'21 |
PDF |
01-19 |
Jungwoo Kim |
Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads |
OSDI'21 |
PDF |
01-13 |
Sanghyeon Lee |
Ten Lessons From Three Generations Shaped Google's TPUv4i |
ISCA'21 |
PDF |
Date
| Presenter
| Title
| Venue
| Slide
|
12-29 |
Soojin Hwang |
AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated Learning |
MICRO'21 |
PDF |
12-15 |
Sunho Lee |
Reverse Engineering Convolutional Neural Networks Through Side-channel Information Leaks |
DAC'18 |
PDF |
12-01 |
Seonjin Na |
REDUCT: Keep it Close, Keep it Cool! |
ISCA'21 |
PDF |
11-09 |
Jungwoo Kim |
Equinox: Training (for Free) on a Custom Inference Accelerator |
MICRO'21 |
PDF |
10-26 |
Yeonjae Kim |
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning |
OSDI'21 |
PDF |
09-15 |
Sanghyeon Lee |
AN IMAGE IS WORTH 16X16 WORDS: Transformers for Image Recognition at scale |
ICLR'21 |
PDF |
09-08 |
Soojin Hwang |
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections |
OSDI'21 |
PDF |
08-17 |
Sunho Lee |
RaPiD: AI Accelerator for Ultra-low Precision Training and Inference |
ISCA'21 |
PDF |
07-27 |
Jungwoo Kim |
Shredder: Learning Noise Distributions to Protect Inference Privacy |
ASPLOS'20 |
PDF |
07-13 |
Yeonjae Kim |
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training |
HPCA'21 |
PDF |
06-22 |
Sanghyeon Lee |
Heterogeneous Dataflow Accelerator for Multi-DNN Workloads |
HPCA'21 |
PDF |
06-08 |
Soojin Hwang |
Procrustes: A dataflow and accelerator for sparse deep neural network training |
MICRO'20 |
PDF |
06-01 |
Sunho Lee |
An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC |
ISSCC'19 |
PDF |
05-25 |
Seonjin Na |
Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores |
MICRO'20 |
PDF |
05-18 |
Seunghyo Kang |
Fast-BCNN: Massive Neuron Skipping in Bayesian Convolutional Neural Networks |
MICRO'20 |
PDF |
04-20 |
Yeonjae Kim |
ALERT: Accurate Learning for Energy and Timeliness |
ATC'20 |
PDF |
04-06 |
Sanghyeon Lee |
Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling |
HPCA'21 |
PDF |
03-30 |
Soojin Hwang |
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning |
HPCA'21 |
PDF |
03-16 |
Sunho Lee |
LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference |
HPCA'21 |
PDF |
03-02 |
Seonjin Na |
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training |
ATC'20 |
PDF |
02-23 |
Seunghyo Kang |
SuperNPU: An Extremely Fast Neural Processing Unit Using Superconducting Logic Devices |
MICRO'20 |
PDF |
02-09 |
Yeonjae Kim |
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads |
OSDI'20 |
PDF |
02-02 |
Soojin Hwang |
SparseTrain: Leveraging Dynamic Sparsity in Software for Training DNNs on General-Purpose SIMD Processors |
PACT'20 |
PDF |
01-26 |
Sanghyeon Lee |
KungFu: Making Training in Distributed Machine Learning Adaptive |
OSDI'20 |
PDF |
01-19 |
Sunho Lee |
Cheetah: optimizing and accelerating homomorphic encryption for private inference |
HPCA'21 |
PDF |
01-07 |
Seonjin Na |
A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters |
OSDI'20 |
PDF |
Date
| Presenter
| Title
| Venue
| Slide
|
12-15 |
Seunghyo Kang |
Ansor: Generating High-Performance Tensor Programs for Deep Learning |
OSDI'20 |
PDF |
12-01 |
Yeonjae Kim |
Serving DNNs like Clockwork: Performance Predictability from the Bottom Up |
OSDI'20 |
PDF |
11-03 |
Soojin Hwang |
MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product |
MICRO'20 |
PDF |
10-27 |
Sanghyeon Lee |
DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture |
MICRO'20 |
PDF |
10-20 |
Sunho Lee |
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference |
MICRO'20 |
PDF |
09-28 |
Seonjin Na |
ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning |
MICRO'20 |
PDF |
09-22 |
Seunghyo Kang |
PERCIVAL: Making In-Browser Perceptual Ad Blocking Practical with Deep Learning |
ATC'20 |
PDF |
09-02 |
Yeonjae Kim |
DeepSniffer: a DNN Model Extraction Framework based on Learning Architectural Hints |
ASPLOS'20 |
PDF |
08-11 |
Soojin Hwang |
Gorgon: Accelerating Machine Learning from Relational Data |
ISCA'20 |
PDF |
07-28 |
Sanghyeon Lee |
A Simple Framework for Contrastive Learning of Visual Representations |
ICML'20 |
PDF |
07-09 |
Sunho Lee |
A deep reinforcement learning framework for architectural exploration: A routerless NoC case study |
HPCA'20 |
PDF |
06-23 |
Seunghyo Kang |
DRQ: Dynamic Region-based Quantization for Deep Neural Network Acceleration |
ISCA'20 |
PDF |
06-09 |
Sanghyeon Lee |
A Multi-Neural Network Acceleration Architecture |
ISCA'20 |
PDF |
06-02 |
Yeonjae Kim |
PipeDream: generalized pipeline parallelism for DNN training |
SOSP'19 |
PDF |
05-26 |
Soojin Hwang |
A3: Accelerating Attention Mechanisms in Neural Networks with Approximation |
HPCA'20 |
PDF |
05-12 |
Sunho Lee |
Nexus: A GPU Cluster Engine for Accelerating DNN-Based Video Analysis |
SOSP'19 |
PDF |
04-28 |
Seonjin Na |
EDEN: Evolutionary deep networks for efficient machine learning |
MICRO'19 |
PDF |
03-31 |
Seunghyo Kang |
Accelerating Distributed Reinforcement Learning with In-Switch Computing |
ISCA'19 |
PDF |
03-12 |
Soojin Hwang |
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training |
MICRO'19 |
PDF |
03-03 |
Sunho Lee |
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training |
HPCA'20 |
PDF |
02-18 |
Seonjin Na |
Shortcut Mining: Exploiting Cross-Layer Shortcut Reuse in CNN Accelerators |
HPCA'19 |
PDF |
02-13 |
Seunghyo Kang |
Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler |
ASPLOS'18 |
PDF |
02-06 |
Soojin Hwang |
Eager Pruning: Algorithm and Architecture Support for Fast Training of Deep Neural Networks |
ISCA'19 |
PDF |
01-21 |
Sunho Lee |
GRNN: Low-Latency and Scalable RNN Inference on GPUs |
EUROSYS'19 |
PPTX |
01-09 |
Sukchul Cho |
Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks |
ISCA'18 |
PDF |
Date
| Presenter
| Title
| Venue
| Slide
|
12-23 |
Seonjin Na |
PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units |
HPCA'20 |
PDF |
12-16 |
Seunghyo Kang |
The Dark Side of DNN Pruning |
ISCA'18 |
PDF |
11-25 |
Soojin Hwang |
NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units |
ASPLOS'20 |
PDF |
11-18 |
Sunho Lee |
Prediction based Execution on Deep Neural Networks |
ISCA'18 |
PDF |
11-11 |
Sukchul Cho |
SCALE-Sim: Systolic CNN Accelerator Simulator |
arXiv'18 |
PDF |