Machine Learning for Programming (Seminar)

Quick Facts

Organizer Michael Pradel
Teaching assistants Daniel Lehmann, Islem Bouzenia, Luca Di Grazia, Matteo Paltenghi
Course type Advanced seminar
Language English
Ilias Ilias course (for discussions, etc.)
Place Universitätstr. 38, room 0.463

Content

This seminar is about recent research on improving software and increasing developer productivity by using machine learning, including deep learning. We will discuss research papers that present novel techniques for improving software reliability and security, such as program analyses to detect bugs, to complete partial code, or to de-obfuscate code, based on machine learning models of code.

After the initial kick-off meeting, each student is assigned a research paper. Each student presents her/his paper in a talk during the weekly meetings. Talks are given twice, where the purpose of the first talk is to get constructive feedback for improving the second talk and the student's general presentation skills. Moreover, each student prepares a term paper that summarizes the original research paper.

Organization

The course will be classroom-first, i.e., to the extent possible, all activities will be in a physical classroom or based on physical meetings.

Schedule

This is a preliminary schedule and may be subject to change.

Date Event
Oct 21, 2021, 2:00pm Kick-off meeting
Slides: Motivation and Organization, DeepBugs
Oct 27, 2021, 11:59pm Deadline for choosing topics
Nov 18, 2021, 2:00pm First round of talks:
Swarnendu Sengupta (topic 8)
Deepanshu Sonparote (topic 7)
Nov 25, 2021, 2:00pm First round of talks:
Sinan Kurtyigit (topic 1)
Simon Weiler (topic 14)
Dec 2, 2021, 2:00pm First round of talks:
Koushik Ragavendran (topic 2)
Rohit G Hegde (topic 5)
Dec 9, 2021, 2:00pm First round of talks:
Keyuriben Patel (topic 6)
Yiu Wai Chow (topic 9)
Jan 13, 2022, 2:00pm Second round of talks:
Simon Weiler (topic 14)
Deepanshu Sonparote (topic 7)
Jan 14, 2022, 11:59pm Deadline for first version of term papers
Jan 27, 2022, 2:00pm Second round of talks:
Sinan Kurtyigit (topic 1)
Koushik Ragavendran (topic 2)
Rohit G Hegde (topic 5)
Jan 28, 2022, 11:59pm Deadline for peer feedback on term papers
Feb 3, 2022, 2:00pm Second round of talks:
Swarnendu Sengupta (topic 8)
Keyuriben Patel (topic 6)
Yiu Wai Chow (topic 9)
Feb 11, 2022, 11:59pm Deadline for term papers

Topics

The following research papers are available for discussion. Use Google Scholar to find a copy of a paper. After the kick-off meeting, each student gets assigned one paper for presentation.

[1] Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. Hoppity: Learning graph transformations to detect and fix bugs in programs. In ICLR, 2020.
[2] Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. Typilus: Neural type hints. In PLDI, 2020.
[3] Yaniv David, Uri Alon, and Eran Yahav. Neural reverse engineering of stripped binaries using augmented control flow graphs. In OOPSLA, 2020.
[4] Yu Wang, Fengjuan Gao, Linzhang Wang, and Ke Wang. Learning semantic program embeddings with graph interval neural network. In OOPSLA, 2020.
[5] Baptiste Rozière, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. Unsupervised translation of programming languages. In NeurIPS, 2020.
[6] Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Yanjun Pu, and Xudong Liu. Learning to handle exceptions. In ASE, 2020.
[7] Fang Liu, Ge Li, Yunfei Zhao, and Zhi Jin. Multi-task learning based pre-trained language model for code completion. In ASE, 2020.
[8] Nghi D. Q. Bui, Yiyun Yu, and Lingxiao Jiang. Infercode: Self-supervised learning of code representations by predicting subtrees. In ICSE, 2021.
[9] Roei Schuster, Congzheng Song, Eran Tromer, and Vitaly Shmatikov. You autocomplete me: Poisoning vulnerabilities in neural code completion. In USENIX Security, 2021.
[10] Seohyun Kim, Jinman Zhao, Yuchi Tian, and Satish Chandra. Code prediction by feeding trees to transformers. In ICSE, 2021.
[11] Md. Rafiqul Islam Rabin, Vincent J. Hellendoorn, and Mohammad Amin Alipour. Understanding neural code intelligence through program simplification. In ESEC/FSE, 2021.
[12] Berkay Berabi, Jingxuan He, Veselin Raychev, and Martin Vechev. Tfix: Learning to fix coding errors with a text-to-text transformer. In ICML, 2021.
[13] Qihao Zhu, Zeyu Sun, Yuan-an Xiao, Wenjie Zhang, Kang Yuan, Yingfei Xiong, and Lu Zhang. A syntax-guided edit decoder for neural program repair. In ESEC/FSE, 2021.
[14] Bo Li, Qiang He, Feifei Chen, Xin Xia, Li Li, John C. Grundy, and Yun Yang. Embedding app-library graph for neural third party library recommendation. In ESEC/FSE, 2021.
[15] Yi Li, Shaohua Wang, and Tien N. Nguyen. Fault localization with code coverage representation learning. In ICSE, 2021.

Template for Term Paper

Please use this LaTeX template for writing your term paper. The page limit is six pages (strict).

Grading

Grading is based on the term paper, the talk, and active participation during the meetings.