SNU CSE Spring 2022

AI773: Special Topics in Artificial Intelligence - Deep Learning and Real-world Applications (4190.773.001)

Deep learning is now an integral part of daily systems and tools people use, and therefore no longer is a concern of only academic research. You will get the front-row experience on practical issues in research and development of deep learning systems from the leading experts and researchers. Major course activities include:

Reading Response: You'll read and discuss important papers and articles in the field. Each week, there will be 1-2 reading assignments, for which you'll write a short response to.
Topic Presentation: Once a semester, you'll lead the class by summarizing the readings, and spurring the in-class discussion.
In-class Activities: Each class will feature activities that will help you understand core concepts introduced in the course.

Course Staff

Instructor:
    Prof. Sangdoo Yun

TAs:
     유승룡
     강봉균

Staff Mailing List:
     dl_ai773@navercorp.com
     note: this is a group email address that includes the instructors and the TAs.

Time & Location

When: 10:00am-12:45pm Fri
Where: Zoom (through ETL)

Links

Course Website: https://ai773.github.io/spring-2022/
Submission & Grading: ETL
QnA: ETL or email

Updates

3/11: First invited talk sessions without student presentations
3/10: Uploaded well-written reading response examples. See Example 1 and Example 2.
3/4: ~~Choose the papers you want to present, please fill in this survey (due: 3/10).~~ (3/10: 49 done among 56 participants, 33 selected among 46 papers.)
3/4: Extra-enrollments and auditing applications are closed.
2/28: Welcome to the deep learning and real-world applications class! We're still finalizing the schedule and the reading list. Stay tuned!

Schedule

Week	Date	Topic	Invited Speaker	Reading (response indicates a reading response is required for one of the two articles.)	Due
1	3/4	Introduction & Course Overview [slide]	윤상두	please read the updated course syllabus, and please ask any questions you might have.
2	3/11	Representation learning in computer vision Session 1: Backbone architectures for computer vision [slide]	허병호	(1) response 1 Kornblith, Simon, et al. "Do better imagenet models transfer better?", CVPR 2019 (2) response 1 Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." ICLR 2021 Recommended reading Han, Dongyoon, et al. "Rethinking channel dimensions for efficient model design." , CVPR 2021 Heo, Byeongho, et al. "Rethinking spatial dimensions of vision transformers.", ICCV 2021	3/9
2	3/11	Representation learning in computer vision Session 2: Training strong and robust vision models [slide]	윤상두	(1) response 2 Zhang, Hongyi, et al. "mixup: Beyond Empirical Risk Minimization." , ICLR 2018 (2) response 2 Shankar, Vaishaal, et al. "Evaluating Machine Accuracy on ImageNet." , ICML 2020 Recommended reading He, Tong, et al. "Bag of tricks for image classification with convolutional neural networks." , CVPR 2019. Wightnam, Ross et al., "ResNet strikes back: An improved training procedure in timm." , arXiv 2021. Yun, Sangdoo, et al. "CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features." , ICCV 2019 Yun, Sangdoo, et al. "Re-labeling imagenet: from single to multi-labels, from global to localized labels." , CVPR 2021	3/9
3	3/18	Multimodal representation learning Session 1: Multimodal deep learning	김진화	(1) response 1 Kim, Jin-Hwa, Jaehyun Jun, and Byoung-Tak Zhang. "Bilinear attention networks.", NeurIPS 2018 (2) response 1 Anderson, Peter, et al. "Bottom-up and top-down attention for image captioning and visual question answering.", CVPR 2018 Recommended reading Ngiam, Jiquan, et al. "Multimodal deep learning.", ICML 2011 Goyal et al. "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering", CVPR 2017 Hudson, Drew A., and Christopher D. Manning. "GQA: a new dataset for compositional question answering over real-world images". CVPR 2019	3/16
3	3/18	Multimodal representation learning Session 2: Vision-and-Language Pre-training	김원재	(1) response 2 Lu, Jiasen, et al. "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks.", NeurIPS 2019 (2) response 2 Kim, Wonjae, et al. "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision.", ICML 2021 Recommended reading Chen, Yen-Chun, et al. "UNITER: UNiversal Image-TExt Representation Learning.", ECCV 2020 Singh, Amanpreet, et al. "FLAVA: A Foundational Language And Vision Alignment Model.", arXiv 2021 Li, Junnan, et al. "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.", arXiv 2022	3/16
4	3/25	Generative models Session 1: Unsupervised representation learning for class clustering	김윤지	(1) response 1 Ji, Xu, et al. "Invariant Information Clustering for Unsupervised Image Classification and Segmentation.", ICCV 2019 (2) response 1 Van, Gansbeke, et al. "SCAN: Learning to Classify Images without Labels.", ECCV 2020 Recommended reading Chen, Xi, et al. "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets.", NeurIPS 2016 Krishna, Kumar, Singh, et al. "FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery.", CVPR 2019 Kim, Yunji and Ha, Jung-Woo. "Contrastive Fine-grained Class Clustering via Generative Adversarial Networks.", ICLR 2022
4	3/25	Generative models Session 2: How to improve the Generators in GANs?	김준호	(1) response 2 Kang, Minguk and Park, Jaesik. "ContraGAN: Contrastive Learning for Conditional Image Generation." NeurIPS 2020 (2) response 2 Liu, Bingchen, et al. "Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis." ICLR 2021 Recommended reading Zhao, Long, et al. "Improved Transformer for High-Resolution GANs.", ICLR 2021 Karras, Tero, et al. "Analyzing and Improving the Image Quality of StyleGAN.", CVPR 2020 Zhang, Han, et al. "Consistency Regularization for Generative Adversarial Networks.", ICLR 2020 Kim, Junho, et al. "Feature Statistics Mixing Regularization for Generative Adversarial Networks.", arXiv 2021
5	4/1	Towards reliable machine learning Session 1: Threats of un-trustworthy AI: understanding shorcut learning by a case study	전상혁	(1) response 1 Brendel, et al. "Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet", ICLR 2019 (2) response 1 Geirhos, et al. "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness", ICLR 2019 Recommended reading Geirhos, et al. Shortcut Learning in Deep Neural Networks” , Nature Machine Intelligence 2020. Scimeca, et al. “Which Shortcut Cues Will DNNs Choose? A Study from the Parameter-Space Perspective”, ICLR 2022. de Vries, Terrance, et al. "Does object recognition work for everyone?." CVPR Workshops. 2019.
5	4/1	Towards reliable machine learning Session 2: Towards reliable machine learning: in a lens of cross-bias generalization and domain generalization	전상혁	(1) response 2 Madry, et al. "Towards Deep Learning Models Resistant to Adversarial Attacks", ICLR 2018 (2) response 2 Ganin, et al. "Domain-Adversarial Training of Neural Networks", JMLR 2016 Recommended reading Bahng, et al. “Learning De-biased Representations with Biased Representations”, ICML 2020. Nam, et al. "Learning from Failure: Training Debiased Classifier from Biased Classifier", NeurIPS 2020. Cha, et al. "SWAD: Domain Generalization by Seeking Flat Minima", NeurIPS 2021.
6	4/8	Practical scenarios and applications in computer vision Session 1: Face recognition: research to product	유영준	(1) response 1 An, Xiang, et al. "Partial FC: Training 10 Million Identities on a Single Machine.", ICCV 2021 (2) response 1 Sculley, David, et al. "Hidden technical debt in machine learning systems.", NeurIPS 2015 Recommended reading Guo, Yandong, et al. "MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition.", ECCV 2016 Zhu, Zheng, et al. "WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition.", CVPR 2021
6	4/8	Practical scenarios and applications in computer vision Session 2: Video AI and applications	위동윤	(1) response 2 Feichtenhofer, Christoph, et al. "SlowFast Networks for Video Recognition.", ICCV 2019 (2) response 2 Wang, Xiaolong, et al. "Non-local Neural Networks.", CVPR 2018 Recommended reading Carreira, Joao and Zisserman, Andrew. "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset.", CVPR 2017 Cu, Chunhui, et al. "AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions.", CVPR 2018 Kim, Jinhyung, et al. "Regularization on Spatio-Temporally Smoothed Feature for Action Recognition.", CVPR 2020
7	4/15	Practical scenarios and applications in computer vision Session 1: All about CLOVA OCR	백영민	(1) response 1 Kittenplon, Yair, et al. "Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer.", arXiv 2022 (2) response 1 Baek, Youngmin, et al. "Character region awareness for text detection.", CVPR 2019 Recommended reading Baek, Youngmin, et al. "What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis.", ICCV 2019 Baek, Youngmin, et al. "Character Region Attention For Text Spotting", ECCV 2020
7	4/15	Practical scenarios and applications in computer vision Session 2: AI generating handwrittings	이바도	(1) response 2 Cha, Junbum, et al. "Few-shot Compositional Font Generation with Dual Memory.", ECCV 2020 (2) response 2 Park, Song, et al. "Few-shot Font Generation with Localized Style Representations and Factorization.", AAAI 2021 Recommended reading Park, Song, et al. "Multiple Heads are Better than One:Few-shot Font Generation with Multiple Localized Experts.", ICCV 2021
8	4/22	No invited talk - Student presentations		TBD
9	4/29	Speech recognition and applications Session 1: Introduction of End-to-End Speech recognition	정남규	(1) response 1 Culati, Anmol, et al. "Conformer: Convolution-augmented Transformer for Speech Recognition.", Interspeech 2020 (2) response 1 Han, Wei, et al. "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context.", Interspeech 2020 Recommended reading Graves, Alex, et al. "Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks.", ICML 2006 Amodei, Dario, et al. "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin.", ICML 2016 Graves, Alex. "Sequence Transduction with Recurrent Neural Networks.", ICML 2012 Workshop
9	4/29	Speech recognition and applications Session 2: Self-supervised End-to-End Speech recognition	김한규	(1) response 2 Hsu, Wei-Ning, et al. "HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.", IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021 (2) response 2 Chung, Yu-An, et al. "W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training.", arXiv 2021 Recommended reading Baevski, Alexei, et al. "wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.", NeurIPS 2020 Zoph, Barret, et al. "Self-training and Pre-training are Complementary for Speech Recognition.", NeurIPS 2020 Zhang, Yu, et al. "Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition.", NeurIPS 2020 Workshop
10	5/6	Voice synthesis and applications Session 1	송은우	(1) response 1 Shen, Jonathan, et al. "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions.", ICASSP 2018 (2) response 1 Ren, Yi, et al. "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.", ICLR 2021 Recommended reading Wang, Yuxuan, et al. "Tacotron: Towards End-to-End Speech Synthesis.", Interspeech 2017 Li, Naihan, et al. "Neural Speech Synthesis with Transformer Network.", AAAI 2019 Yi, Ren, et al. "FastSpeech: Fast, Robust and Controllable Text to Speech.", NeurIPS 2019
10	5/6	Voice synthesis and applications Session 2	황민제	(1) response 2 Kumar, Kundan, et al. "MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis.", NeurIPS 2019 (2) response 2 Yamamoto, Ryuichi, et al. "Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram.", ICASSP 2020 Recommended reading Oord, Aaron van den, et al. "WaveNet: A Generative Model for Raw Audio.", arXiv 2016 Oord, Aaron van den, et al. "Parallel WaveNet: Fast High-Fidelity Speech Synthesis.", ICML 2018 Kong, Jungil, et al. "HiFi-GAN- Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis.", NeurIPS 2020
11	5/13	Large-scale user modeling and its applications Session 1	곽하녹	(1) response 1 Shin, et al. "Scaling Law for Recommendation Models: Towards General-purpose User Representations", arXiv 2021 (2) response 1 Shin, et al. "One4all user representation for recommender systems in e-commerce", arXiv 2021
11	5/13	Large-scale user modeling and its applications Session 2	정지수	(1) response 2 Hsieh, et al. "Collaborative Metric Learning.", WWW 2017 (2) response 2 Kim, Boseop, et al. "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers.", EMNLP 2021 Recommended reading Tran, et al. "Hierarchical Latent Relation Modeling for Collaborative Metric Learning.", RecSys 2021 OpenAI. "GPT-3 Powers the Next Generation of Apps.", OpenAI Blog 2021 Eric, Verduzco. "Best GPT-3 Tools, Examples and Use Cases.", 2021
12	5/20	AutoML and Practical MLOps Session 1	김지훈	(1) response 1 Real, Esteban, et al. "AutoML-Zero: Evolving Machine Learning Algorithms From Scratch.", ICML 2020 (2) response 1 Falkner, Stefan, et al. "BOHB: Robust and Efficient Hyperparameter Optimization at Scale.", ICML 2018 Recommended reading Liu, Hanxiao, et al. "DARTS: Differentiable Architecture Search.", ICLR 2019 Dong, XuanYi and Yang, Yi. "NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search.", ICLR 2020 Hyperparameter Optimization: Li, Lisha, et al. "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.", JLMR 2018 Jaderberg, Max, et al. "Population Based Training of Neural Networks.", arXiv 2017
12	5/20	AutoML and Practical MLOps Session 2	서동필	No reading for his session
13	5/27	NLP, Dialogues, and QA Session 1	이상우	(1) response 1 Devlin, et al. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.", NAACL 2019. (2) response 1 Raffel, et al. "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.", JMLR 2020. Recommended reading Radford, Alec, et al. "Language Models are Unsupervised Multitask Learners.", OpenAI 2019 Yoo, Kangmin, et al. "GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation.", ACL Findings 2021 Kim, Sungdong, et al. "NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-Based Simulation.", , ACL 2021
13	5/27	NLP, Dialogues, and QA Session 2	김성동	(1) response 2 Roller, Stephen, et al. "Recipes for building an open-domain chatbot.", EACL 2021 (2) response 2 Lewis, Patrick, et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks", NeurIPS 2020 Recommended reading Izacard and Grave. "Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering.", EACL 2021 Shuster, Kurt, et al. "Retrieval Augmentation Reduces Hallucination in Conversation.", EMNLP Findings 2021 Xu, Jing, et al. "Beyond Goldfish Memory: Long-Term Open-Domain Conversation.", arXiv 2021 Borgeaud, Sebastian, et al. "Improving language models by retrieving from trillions of tokens.", arXiv 2021 Sungdong, Kim and Gangwoo, Kim. "Saving Dense Retriever from Shortcut Dependency in Conversational Search.", arXiv 2022
14	6/3	Hyperscale LM & NLP applications Session 1	이기창	(1) response 1 Brown, et al. "Language Models are Few-Shot Learners.", NeurIPS 2021 (2) response 1 Rae, et al. "Scaling Language Models: Methods, Analysis & Insights from Training Gopher.", arXiv 2021. Recommended reading Smith, et al. "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model.", arXiv 2022 Tay, et al., "Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers.", ICLR 2022 Kim, Boseop, et al. "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers.", EMNLP 2021
14	6/3	Hyperscale LM & NLP applications Session 2	유강민	(1) response 2 Lester, Brian, et al. "The Power of Scale for Parameter-Efficient Prompt Tuning.", EMNLP 2021 (2) response 2 Li, Xiang Lisa, and Percy, Liang. "Prefix-Tuning: Optimizing Continuous Prompts for Generation.", arXiv 2021 Recommended reading He, Junxian, et al. "Towards a Unified View of Parameter-Efficient Transfer Learning.", ICLR 2022 J. Hu, Edward, et al. "LoRA: Low-Rank Adaptation of Large Language Models.", arXiv 2021 Schick, Timo and Schütze, Hinrich "It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners.", NAACL 2021 Ouyang, Long, et al. "Training language models to follow instructions with human feedback (InstructGPT), OpenAI Blog 2022
15	6/10	Human-centric NLP Session 1	이화란	(1) response 1 Dinan, Emily, et al. "Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation.", EMNLP 2020 (2) response 1 Perez, Ethan, et al. "Red Teaming Language Models with Language Models.", arXiv 2022. Recommended reading Bender, Emily M., et al. "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?🦜.", ACM Conference on Fairness, Accountability, and Transparency 2021 Liu, Haochen, et al. "Does Gender Matter? Towards Fairness in Dialogue Systems.", COLING 2020 Liu, Haochen, et al. "Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning.", EMNLP 2020 Sheng, Emily, et al. "“Nice Try, Kiddo”: Investigating Ad Hominems in Dialogue Responses.", NAACL 2021 Ma, Xinyao, et al. "PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction.", EMNLP 2020 Xu, Albert, et al. "Detoxifying Language Models Risks Marginalizing Minority Voices.", NAACL 2021 OpenAI. "WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing (WebGPT).", OpenAI Blog 2021
15	6/10	Human-centric NLP Session 2	정준영, 이민아	(1) response 2 Chung, JJY, et al. "TaleBrush: Sketching Stories with Generative Pretrained Language Models.", CHI 2022 (2) response 2 Lee, Mina, Percy Liang, and Qian Yang. "CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities.", CHI 2022 Recommended reading Clark, Elizabeth, et al. "Creative writing with a machine in the loop: Case studies on slogans and stories.", IUI 2018 Singh, Nikhil, et al. "Where to Hide a Stolen Elephant: Leaps in Creative Writing with Multimodal Machine Intelligence.", ToCHI 2022 Krause, Ben, et al. "Gedi: Generative discriminator guided sequence generation.", EMNLP findings 2021 Qian, Jing et al. "Controllable Natural Language Generation with Contrastive Prefixes.", arXiv 2022 Buschek, Daniel, Martin Zürn, and Malin Eiband. "The impact of multiple parallel phrase suggestions on email input and composition behaviour of native and non-native english writers.", CHI 2021 Calderwood, Alex, et al. "How Novelists Use Generative Language Models: An Exploratory User Study.", HAI-GEN+ user2agent@ IUI. 2020.
16	6/17	No invited talk - Student presentations		TBD

Topics (tentative)

Major topics include:

Representation Learning
Reliable ML
Voice and Speech
NLP
MLOps
Recommendation systems

Grading

Attendance: 20%
Reading responses: 40%
Topic presentation: 20%
Class participation: 10%
Quizes: 10%

Late policy: Three lowest reading response grades will be removed. No late submissions are allowed for the reading responses.

Prerequisites

There are no official course prerequisites. But assignments involve a lot of reading, research experience in machine learning is useful, but not required.