Machine Learning for Programming (Seminar)

Quick Facts

Organizer Michael Pradel
Course type Advanced seminar
Language English
Place Universitätstr. 38, 0.453

Content

This seminar is about recent research on improving software and increasing developer productivity by using machine learning, including deep learning. We will discuss research papers that present novel techniques for improving software reliability and security, such as program analyses to detect bugs, to complete partial code, or to de-obfuscate code, based on machine learning models of code.

After the initial kick-off meeting, each student is assigned a research paper. Each student presents her/his paper during one of the weekly meetings. Moreover, each student prepares a term paper that summarizes the original research paper.

Schedule

Date Topic Deadlines or special events
Oct 17, 2019 Kick-off meeting
Oct 20, 2019 Deadline for selecting topics
Oct 31, 2019 (Start of presentations, details TBD)

Topics

The following research papers are available for discussion. Use Google Scholar to find a copy of a paper. After the kick-off meeting, each student gets assigned one paper for presentation.

[1] Chang Liu, Xin Wang, Richard Shin, Joseph E. Gonzalez, and Dawn Song. Neural code completion. Technical report, UC Berkeley, 2016.
[2] Rui Zhao, David Bieber, Kevin Swersky, and Daniel Tarlow. Neural networks for modeling source code edits. 2018.
[3] Milan Cvitkovic, Badal Singh, and Anima Anandkumar. Open vocabulary learning on source code with a graph-structured cache. CoRR, abs/1810.08305, 2018.
[4] Tal Ben-Nun, Alice Shoshana Jakobovits, and Torsten Hoefler. Neural code comprehension: A learnable representation of code semantics. CoRR, abs/1806.07336, 2018.
[5] Miltiadis Allamanis. The adverse effects of code duplication in machine learning models of code. arXiv preprint arXiv:1812.06469, 2018.
[6] Pengcheng Yin, Graham Neubig, Marc Brockschmidt Miltiadis Allamanis and, and Alexander L. Gaunt. Learning to represent edits. CoRR, 1810.13337, 2018.
[7] Ke Wang, Rishabh Singh, and Zhendong Su. Search, align, and repair: data-driven feedback generation for introductory programming exercises. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018, pages 481--495, 2018.
[8] Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, and Thomas W. Reps. Code vectors: understanding programs through embedded abstracted symbolic traces. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018, pages 163--174, 2018.
[9] Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with graphs. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.
[10] Xujie Si, Hanjun Dai, Mukund Raghothaman, Mayur Naik, and Le Song. Learning loop invariants for program verification. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pages 7762--7773, 2018.
[11] Jacob Harer, Onur Ozdemir, Tomo Lazovich, Christopher P. Reale, Rebecca L. Russell, Louis Y. Kim, and Sang Peter Chin. Learning to repair software vulnerabilities with generative adversarial networks. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pages 7944--7954, 2018.
[12] Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. A novel neural source code representation based on abstract syntax tree. In ICSE, 2019.
[13] Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. code2vec: learning distributed representations of code. PACMPL, 3(POPL):40:1--40:29, 2019.
[14] Zhengkai Wu, Evan Johnson, Wei Yang, Osbert Bastani, Dawn Song, Jian Peng, and Tao Xie. REINAM: reinforcement learning for input-grammar inference. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019., pages 488--498, 2019.
[15] Jan Eberhardt, Samuel Steffen, Veselin Raychev, and Martin T. Vechev. Unsupervised learning of API aliasing specifications. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019., pages 745--759, 2019.
[16] Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. On learning meaningful code changes via neural machine translation. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, pages 25--36, 2019.

Template for Term Paper

Please use this LaTeX template for writing your term paper. The page limit is five pages (strict).

If you're not yet familiar with LaTeX, you may want to try the Overleaf online editor (click on "SIG Proceedings Paper" to start with the required template).

Grading

Grading is based on the term paper, the talk, and active participation during each presentation.