Machine Learning for Programming (Seminar)

Quick Facts

Organizer Michael Pradel
Teaching assistants Islem Bouzenia, Aryaz Eghbali, Luca Di Grazia, Matteo Paltenghi, Huimin Hu
Course type Advanced seminar
Language English
Ilias Ilias course (for discussions, etc.)
Place Universitätstr. 38, room 0.453

Content

This seminar is about recent research on improving software and increasing developer productivity by using machine learning, including deep learning. We will discuss research papers that present novel techniques for improving software reliability and security, such as program analyses to detect bugs, to complete partial code, or to de-obfuscate code, based on machine learning models of code.

After the initial kick-off meeting, each student is assigned a research paper. Each student presents her/his paper in a talk during the weekly meetings and prepares a term paper that summarizes the original research paper.

Organization

The course will be classroom-first, i.e., to the extent possible, all activities will be in a physical classroom or based on physical meetings.

Schedule

This is a preliminary schedule and may be subject to change.

Date Event
Oct 20, 2022, 2:00pm Kick-off meeting
Slides
Oct 24, 2022, 11:59pm Deadline for choosing topics
Nov 25, 2022, 11:59pm Deadline for first draft of term papers (only "paper-focused" students)
Dec 8, 2022, 2:00pm First round of talks:
Maïssane Merrheim (topic 10)
Jayesh Manani (topic 19)
Jan 12, 2023, 2:00pm Talks:
Jan Hofmann (topic 6)
Aditi Godbole (topic 9)
Fabian Leeske (topic 2)
Jan 13, 2023, 11:59pm Deadline for second draft of term papers (all students)
Jan 19, 2023, 2:00pm Talks:
Lars Gröninger (topic 3)
Mohammed Faiz Sayyed (topic 8)
Monika Singh (topic 7)
Jan 26, 2023, 2:00pm Talks:
Maïssane Merrheim (topic 10)
Jayesh Manani (topic 19)
Vinoth Kannan (topic 4)
Feb 10, 2023, 11:59pm Deadline for final term papers (all students)

Topics

The following research papers are available for discussion. Use Google Scholar to find a copy of a paper. After the kick-off meeting, each student gets assigned one paper for presentation.

[1] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating large language models trained on code. CoRR, 2021.
[2] Md. Rafiqul Islam Rabin, Vincent J. Hellendoorn, and Mohammad Amin Alipour. Understanding neural code intelligence through program simplification. In ESEC/FSE, 2021.
[3] Berkay Berabi, Jingxuan He, Veselin Raychev, and Martin Vechev. Tfix: Learning to fix coding errors with a text-to-text transformer. In ICML, 2021.
[4] Michihiro Yasunaga and Percy Liang. Break-it-fix-it: Unsupervised learning for program repair. In ICML, 2021.
[5] Jürgen Cito, Isil Dillig, Vijayaraghavan Murali, and Satish Chandra. Counterfactual explanations for models of code. In ICSE-SEIP, 2022.
[6] Deze Wang, Zhouyang Jia, Shanshan Li, Yue Yu, Yun Xiong, Wei Dong, and Xiangke Liao. Bridging pretrained models and downstream tasks for source code understanding. In ICSE, 2022.
[7] Naman Jain, Skanda Vaidyanath, Arun Iyer, Nagarajan Natarajan, Suresh Parthasarathy, Sriram Rajamani, and Rahul Sharma. Jigsaw: Large language models meet program synthesis. In ICSE, 2022.
[8] Agnieszka Ciborowska and Kostadin Damevski. Fast changeset-based bug localization with bert. In ICSE, 2022.
[9] Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. Asleep at the keyboard? assessing the security of github copilot's code contributions. In IEEE Symposium on Security and Privacy, 2022.
[10] Anant Kharkar, Roshanak Zilouchian Moghaddam, Matthew Jin, Xiaoyu Liu, Xin Shi, Colin Clement, and Neel Sundaresan. Learning to reduce false positives in analytic bug detectors. In ICSE, 2022.
[11] Qibin Chen, Jeremy Lacomis, Edward J. Schwartz, Graham Neubig, Bogdan Vasilescu, and Claire Le Goues. Varclr: Variable semantic representation pre-training via contrastive learning. In ICSE, 2022.
[12] Akshay Utture, Shuyang Liu, Christian Gram Kalhauge, and Jens Palsberg. Striking a balance: Pruning false-positives from static call graphs. In ICSE, 2022.
[13] Jingxuan He, Luca Beurer-Kellner, and Martin T. Vechev. On distribution shift in learning-based bug detectors. In ICML, 2022.
[14] He Ye, Matias Martinez, and Martin Monperrus. Neural program repair with execution-based backpropagation. In ICSE, 2022.
[15] Daya Guo, Alexey Svyatkovskiy, Jian Yin, Nan Duan, Marc Brockschmidt, and Miltiadis Allamanis. Learning to complete code with sketches. In ICLR, 2022.
[16] Thanh Le-Cong, Hong Jin Kang, Truong Giang Nguyen, Stefanus Agus Haryono, David Lo, Xuan-Bach D. Le, and Quyet Thang Huynh. Autopruner: Transformer-based call graph pruning. In ESEC/FSE, 2022.
[17] Disha Shrivastava, Hugo Larochelle, and Daniel Tarlow. Repository-level prompt generation for large language models of code. CoRR, 2022.
[18] Pardis Pashakhanloo, Aaditya Naik, Yuepeng Wang, Hanjun Dai, Petros Maniatis, and Mayur Naik. Codetrek: Flexible modeling of code using an extensible relational representation. In ICLR, 2022.
[19] Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. Synchromesh: Reliable code generation from pre-trained language models. In ICLR, 2022.

Template for Term Paper

Please use this LaTeX template for writing your term paper. The page limit is six pages (strict).

Grading

Grading is based on the term paper, the talk, and active participation during the meetings.