Machine Learning for Programming (Seminar)

Quick Facts

Organizer Michael Pradel
Teaching assistants Aryaz Eghbali, Luca Di Grazia, Daniel Lehmann, Jibesh Patra, Moiz Rauf
Course type Advanced seminar
Language English
Ilias Ilias course (for discussions, etc.)
Place Virtual

Content

This seminar is about recent research on improving software and increasing developer productivity by using machine learning, including deep learning. We will discuss research papers that present novel techniques for improving software reliability and security, such as program analyses to detect bugs, to complete partial code, or to de-obfuscate code, based on machine learning models of code.

After the initial kick-off meeting, each student is assigned a research paper. Each student presents her/his paper in a talk during the weekly meetings. Talks are given twice, where the purpose of the first talk is to get constructive feedback for improving the second talk and the student's general presentation skills. Moreover, each student prepares a term paper that summarizes the original research paper.

Schedule

All meetings are via Webex. They are not be recorded and attendance is strongly recommended.

Date Event
Nov 5, 2020, 2pm-3:30pm Kick-off meeting
Nov 10, 2020 (end of day) Deadline for selecting papers
Dec 10, 2020, 2pm-3:30pm First round of talks:
Xinyi Pan (Topic 19)
Alona Liuzniak (Topic 9)
Dec 17, 2020, 2pm-3:30pm First round of talks:
Daniel Baumgartner (Topic 2)
Marvin Dostal (Topic 21)
Dec 22, 2020 (end of day) Deadline for term papers
Jan 7, 2021, 2pm-3:30pm First round of talks:
Ya-Jen Hsu (Topic 12)
Gerrit Klein (Topic 18)
Jan 14, 2021, 2pm-3:30pm First round of talks:
Viktor Krimstein (Topic 10)
Jan 22, 2021 (end of day) Deadline for reviews of term papers
Jan 28, 2021, 2pm-3:30pm Second round of talks:
Xinyi Pan (Topic 19)
Alona Liuzniak (Topic 9)
Feb 4, 2021, 2pm-3:30pm Second round of talks:
Daniel Baumgartner (Topic 2)
Marvin Dostal (Topic 21)
Feb 11, 2021, 2pm-3:30pm Second round of talks
Ya-Jen Hsu (Topic 12)
Gerrit Klein (Topic 18)
Viktor Krimstein (Topic 10)
Feb 19, 2021 (end of day) Deadline for final version of term papers

Topics

The following research papers are available for discussion. Use Google Scholar to find a copy of a paper. After the kick-off meeting, each student gets assigned one paper for presentation.

[1] Daniel Tarlow, Subhodeep Moitra, Andrew Rice, Zimin Chen, Pierre-Antoine Manzagol, Charles Sutton, and Edward Aftandilian. Learning to fix build errors with graph2diff neural networks. 2019.
[2] Jeremy Lacomis, Pengcheng Yin, Edward J. Schwartz, Miltiadis Allamanis, Claire Le Goues, Graham Neubig, and Bogdan Vasilescu. Dire: A neural approach to decompiled identifier naming. In ASE, 2019.
[3] Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, and Rishabh Singh. Neural program repair by jointly learning to localize and repair. In ICLR, 2019.
[4] Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. A novel neural source code representation based on abstract syntax tree. In ICSE, 2019.
[5] Zeyu Sun, Qihao Zhu, Lili Mou, Yingfei Xiong, Ge Li, and Lu Zhang. A grammar-based structural CNN decoder for code generation. In AAAI, pages 7055--7062, 2019.
[6] Thong Hoang, Hoa Khanh Dam, Yasutaka Kamei, David Lo, and Naoyasu Ubayashi. Deepjit: an end-to-end deep learning framework for just-in-time defect prediction. In MSR, pages 34--45, 2019.
[7] Zhengkai Wu, Evan Johnson, Wei Yang, Osbert Bastani, Dawn Song, Jian Peng, and Tao Xie. REINAM: reinforcement learning for input-grammar inference. In ESEC/FSE, pages 488--498, 2019.
[8] Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. On learning meaningful code changes via neural machine translation. In ICSE, pages 25--36, 2019.
[9] Kui Liu, Dongsun Kim, Tegawendé F. Bissyandé, Tae-young Kim, Kisub Kim, Anil Koyuncu, Suntae Kim, and Yves Le Traon. Learning to spot and refactor inconsistent method names. In ICSE, pages 1--12, 2019.
[10] Noam Yefet, Uri Alon, and Eran Yahav. Adversarial examples for models of code. In OOPSLA, 2020.
[11] Ke Wang and Zhendong Su. Blended, precise semantic program embeddings. In Alastair F. Donaldson and Emina Torlak, editors, PLDI, pages 121--134. ACM, 2020.
[12] Marie-Anne Lachaux, Baptiste Rozière, Lowik Chanussot, and Guillaume Lample. Unsupervised translation of programming languages. CoRR, abs/2006.03511, 2020.
[13] Fang Liu, Ge Li, Yunfei Zhao, and Zhi Jin. Multi-task learning based pre-trained language model for code completion. In ASE, 2020.
[14] Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Yanjun Pu, and Xudong Liu. Learning to handle exceptions. In ASE, 2020.
[15] Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. Global relational models of source code. In ICLR, 2020.
[16] Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. Lambdanet: Probabilistic type inference using graph neural networks. In ICLR, 2020.
[17] Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. Typilus: Neural type hints. In PLDI, 2020.
[18] Thong Hoang, Hong Jin Kang, David Lo, and Julia Lawall. Cc2vec: Distributed representations of code changes. In ICSE, 2020.
[19] Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. Hoppity: Learning graph transformations to detect and fix bugs in programs. In ICLR, 2020.
[20] Rafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and Andrea Janes. Big code != big vocabulary: Open-vocabulary models for source code. In ICSE, 2020.
[21] Cody Watson, Michele Tufano, Kevin Moran, Gabriele Bavota, and Denys Poshyvanyk. On learning meaningful assert statements for unit test cases. In ICSE, 2020.

Template for Term Paper

Please use this LaTeX template for writing your term paper. The page limit is six pages (strict).

If you're not yet familiar with LaTeX, you may want to try the Overleaf online editor (click on "ACM Conference Proceedings Template" to start with the required template).

Grading

Grading is based on the term paper, the talk, and active participation during each presentation.