SNU CSE Spring 2022

AI773: Special Topics in Artificial Intelligence - Deep Learning and Real-world Applications (4190.773.001)

Deep learning is now an integral part of daily systems and tools people use, and therefore no longer is a concern of only academic research. You will get the front-row experience on practical issues in research and development of deep learning systems from the leading experts and researchers. Major course activities include:

  • Reading Response: You'll read and discuss important papers and articles in the field. Each week, there will be 1-2 reading assignments, for which you'll write a short response to.
  • Topic Presentation: Once a semester, you'll lead the class by summarizing the readings, and spurring the in-class discussion.
  • In-class Activities: Each class will feature activities that will help you understand core concepts introduced in the course.

Course Staff

Instructor:
    Prof. Sangdoo Yun

TAs:
     유승룡
     강봉균

Staff Mailing List:
     dl_ai773@navercorp.com
     note: this is a group email address that includes the instructors and the TAs.

Time & Location

When: 10:00am-12:45pm Fri
Where: Zoom (through ETL)

Links

Course Website: https://ai773.github.io/spring-2022/
Submission & Grading: ETL
QnA: ETL or email

Updates

  • 3/11: First invited talk sessions without student presentations
  • 3/10: Uploaded well-written reading response examples. See Example 1 and Example 2.
  • 3/4: Choose the papers you want to present, please fill in this survey (due: 3/10). (3/10: 49 done among 56 participants, 33 selected among 46 papers.)
  • 3/4: Extra-enrollments and auditing applications are closed.
  • 2/28: Welcome to the deep learning and real-world applications class! We're still finalizing the schedule and the reading list. Stay tuned!

Schedule

Week Date Topic Invited Speaker Reading (response indicates a reading response is required for one of the two articles.) Due
1 3/4 Introduction & Course Overview [slide] 윤상두 please read the updated course syllabus, and please ask any questions you might have.
2 3/11 Representation learning in computer vision
Session 1: Backbone architectures for computer vision [slide]
허병호 (1) response 1 Kornblith, Simon, et al. "Do better imagenet models transfer better?", CVPR 2019
(2) response 1 Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." ICLR 2021

Recommended reading
3/9
2 3/11 Representation learning in computer vision
Session 2: Training strong and robust vision models [slide]
윤상두 (1) response 2 Zhang, Hongyi, et al. "mixup: Beyond Empirical Risk Minimization." , ICLR 2018
(2) response 2 Shankar, Vaishaal, et al. "Evaluating Machine Accuracy on ImageNet." , ICML 2020

Recommended reading
3/9
3 3/18 Multimodal representation learning
Session 1: Multimodal deep learning
김진화 (1) response 1 Kim, Jin-Hwa, Jaehyun Jun, and Byoung-Tak Zhang. "Bilinear attention networks.", NeurIPS 2018
(2) response 1 Anderson, Peter, et al. "Bottom-up and top-down attention for image captioning and visual question answering.", CVPR 2018

Recommended reading
3/16
3 3/18 Multimodal representation learning
Session 2: Vision-and-Language Pre-training
김원재 (1) response 2 Lu, Jiasen, et al. "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks.", NeurIPS 2019
(2) response 2 Kim, Wonjae, et al. "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision.", ICML 2021

Recommended reading
3/16
4 3/25 Generative models
Session 1: Unsupervised representation learning for class clustering
김윤지 (1) response 1 Ji, Xu, et al. "Invariant Information Clustering for Unsupervised Image Classification and Segmentation.", ICCV 2019
(2) response 1 Van, Gansbeke, et al. "SCAN: Learning to Classify Images without Labels.", ECCV 2020

Recommended reading
4 3/25 Generative models
Session 2: How to improve the Generators in GANs?
김준호 (1) response 2 Kang, Minguk and Park, Jaesik. "ContraGAN: Contrastive Learning for Conditional Image Generation." NeurIPS 2020
(2) response 2 Liu, Bingchen, et al. "Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis." ICLR 2021

Recommended reading
5 4/1 Towards reliable machine learning
Session 1: Threats of un-trustworthy AI: understanding shorcut learning by a case study
전상혁 (1) response 1 Brendel, et al. "Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet", ICLR 2019
(2) response 1 Geirhos, et al. "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness", ICLR 2019

Recommended reading
5 4/1 Towards reliable machine learning
Session 2: Towards reliable machine learning: in a lens of cross-bias generalization and domain generalization
전상혁 (1) response 2 Madry, et al. "Towards Deep Learning Models Resistant to Adversarial Attacks", ICLR 2018
(2) response 2 Ganin, et al. "Domain-Adversarial Training of Neural Networks", JMLR 2016

Recommended reading
6 4/8 Practical scenarios and applications in computer vision
Session 1: Face recognition: research to product
유영준 (1) response 1 An, Xiang, et al. "Partial FC: Training 10 Million Identities on a Single Machine.", ICCV 2021
(2) response 1 Sculley, David, et al. "Hidden technical debt in machine learning systems.", NeurIPS 2015

Recommended reading
6 4/8 Practical scenarios and applications in computer vision
Session 2: Video AI and applications

위동윤 (1) response 2 Feichtenhofer, Christoph, et al. "SlowFast Networks for Video Recognition.", ICCV 2019
(2) response 2 Wang, Xiaolong, et al. "Non-local Neural Networks.", CVPR 2018

Recommended reading
7 4/15 Practical scenarios and applications in computer vision
Session 1: All about CLOVA OCR

백영민 (1) response 1 Kittenplon, Yair, et al. "Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer.", arXiv 2022
(2) response 1 Baek, Youngmin, et al. "Character region awareness for text detection.", CVPR 2019

Recommended reading
7 4/15 Practical scenarios and applications in computer vision
Session 2: AI generating handwrittings

이바도 (1) response 2 Cha, Junbum, et al. "Few-shot Compositional Font Generation with Dual Memory.", ECCV 2020
(2) response 2 Park, Song, et al. "Few-shot Font Generation with Localized Style Representations and Factorization.", AAAI 2021

Recommended reading
8 4/22 No invited talk - Student presentations TBD
9 4/29 Speech recognition and applications
Session 1: Introduction of End-to-End Speech recognition
정남규 (1) response 1 Culati, Anmol, et al. "Conformer: Convolution-augmented Transformer for Speech Recognition.", Interspeech 2020
(2) response 1 Han, Wei, et al. "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.", Interspeech 2020

Recommended reading
9 4/29 Speech recognition and applications
Session 2: Self-supervised End-to-End Speech recognition
김한규 (1) response 2 Hsu, Wei-Ning, et al. "HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.", IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021
(2) response 2 Chung, Yu-An, et al. "W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training.", arXiv 2021

Recommended reading
10 5/6 Voice synthesis and applications
Session 1
송은우 (1) response 1 Shen, Jonathan, et al. "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions.", ICASSP 2018
(2) response 1 Ren, Yi, et al. "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.", ICLR 2021

Recommended reading
10 5/6 Voice synthesis and applications
Session 2
황민제 (1) response 2 Kumar, Kundan, et al. "MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis.", NeurIPS 2019
(2) response 2 Yamamoto, Ryuichi, et al. "Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram.", ICASSP 2020

Recommended reading
11 5/13 Large-scale user modeling and its applications
Session 1
곽하녹 (1) response 1 Shin, et al. "Scaling Law for Recommendation Models: Towards General-purpose User Representations", arXiv 2021
(2) response 1 Shin, et al. "One4all user representation for recommender systems in e-commerce", arXiv 2021

11 5/13 Large-scale user modeling and its applications
Session 2
정지수 (1) response 2 Hsieh, et al. "Collaborative Metric Learning.", WWW 2017
(2) response 2 Kim, Boseop, et al. "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers.", EMNLP 2021

Recommended reading
12 5/20 AutoML and Practical MLOps
Session 1
김지훈 (1) response 1 Real, Esteban, et al. "AutoML-Zero: Evolving Machine Learning Algorithms From Scratch.", ICML 2020
(2) response 1 Falkner, Stefan, et al. "BOHB: Robust and Efficient Hyperparameter Optimization at Scale.", ICML 2018

Recommended reading
12 5/20 AutoML and Practical MLOps
Session 2
서동필 No reading for his session
13 5/27 NLP, Dialogues, and QA
Session 1
이상우 (1) response 1 Devlin, et al. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.", NAACL 2019.
(2) response 1 Raffel, et al. "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.", JMLR 2020.

Recommended reading
13 5/27 NLP, Dialogues, and QA
Session 2
김성동 (1) response 2 Roller, Stephen, et al. "Recipes for building an open-domain chatbot.", EACL 2021
(2) response 2 Lewis, Patrick, et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks", NeurIPS 2020

Recommended reading
14 6/3 Hyperscale LM & NLP applications
Session 1
이기창 (1) response 1 Brown, et al. "Language Models are Few-Shot Learners.", NeurIPS 2021
(2) response 1 Rae, et al. "Scaling Language Models: Methods, Analysis & Insights from Training Gopher.", arXiv 2021.

Recommended reading
14 6/3 Hyperscale LM & NLP applications
Session 2
유강민 (1) response 2 Lester, Brian, et al. "The Power of Scale for Parameter-Efficient Prompt Tuning.", EMNLP 2021
(2) response 2 Li, Xiang Lisa, and Percy, Liang. "Prefix-Tuning: Optimizing Continuous Prompts for Generation.", arXiv 2021

Recommended reading
15 6/10 Human-centric NLP
Session 1
이화란 (1) response 1 Dinan, Emily, et al. "Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation.", EMNLP 2020
(2) response 1 Perez, Ethan, et al. "Red Teaming Language Models with Language Models.", arXiv 2022.

Recommended reading
15 6/10 Human-centric NLP
Session 2
정준영, 이민아 (1) response 2 Chung, JJY, et al. "TaleBrush: Sketching Stories with Generative Pretrained Language Models.", CHI 2022
(2) response 2 Lee, Mina, Percy Liang, and Qian Yang. "CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities.", CHI 2022

Recommended reading
16 6/17 No invited talk - Student presentations TBD

Topics (tentative)

Major topics include:
  • Representation Learning
  • Reliable ML
  • Voice and Speech
  • NLP
  • MLOps
  • Recommendation systems

Grading

  • Attendance: 20%
  • Reading responses: 40%
  • Topic presentation: 20%
  • Class participation: 10%
  • Quizes: 10%
Late policy: Three lowest reading response grades will be removed. No late submissions are allowed for the reading responses.

Prerequisites

There are no official course prerequisites. But assignments involve a lot of reading, research experience in machine learning is useful, but not required.