Machine Learning for Programming (Seminar at University of Stuttgart)

Quick Facts

Organizer Jibesh Patra
Examiner Erhard Ploedereder
Course type Seminar
Language English
Kick-off meeting April 8
Course number 020595000
Piazza Course page
Access Code (ml4p)

Content

This seminar is about recent research on improving software and increasing developer productivity by using machine learning, including deep learning. We will discuss research papers that present novel techniques for improving software reliability and security, such as program analyses to detect bugs, to complete partial code, or to de-obfuscate code, based on machine learning models of code.

After the initial kick-off meeting, each student is assigned a research paper. Each student presents her/his paper during one of the weekly meetings. Moreover, each student prepares a term paper that summarizes the original research paper.

Schedule

When What Where
April 8 Kick-off meeting (mandatory for all participants) Universität 38 - 0.457
May 06 to July 15, 2019 Weekly meeting with presentations by the participants Universität 38 - 0.457
June 3, 2019 Term papers due for peer-review ---
June 21, 2019 Reviews due ---
July 15, 2019 Final term papers due ---

Topics

The following research papers are available for discussion. After the kick-off meeting, each student gets assigned one paper.

[1] Wojciech Zaremba and Ilya Sutskever. Learning to execute. CoRR, abs/1410.4615, 2014.
[2] Veselin Raychev, Martin T. Vechev, and Andreas Krause. Predicting program properties from "big code". In POPL, pages 111--124, 2015.
[3] Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. Deep learning code fragments for code clone detection. In ASE, pages 87--98, 2016.
[4] Song Wang, Taiyue Liu, and Lin Tan. Automatically learning semantic features for defect prediction. In ICSE, pages 297--308, 2016.
[5] Pavol Bielik, Veselin Raychev, and Martin T. Vechev. PHOG: probabilistic model for code. In ICML, pages 2933--2942, 2016.
[6] Sahil Bhatia and Rishabh Singh. Automated correction for syntax errors in programming assignments using recurrent neural networks. CoRR, abs/1603.06129, 2016.
[7] Miltiadis Allamanis, Hao Peng, and Charles A. Sutton. A convolutional attention network for extreme summarization of source code. In ICML, pages 2091--2100, 2016.
[8] Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. Deep API learning. In FSE, pages 631--642, 2016.
[9] Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. Neural network-based graph embedding for cross-platform binary code similarity detection. In CCS, pages 363--376, 2017.
[10] Ke Wang, Rishabh Singh, and Zhendong Su. Dynamic neural program embedding for program repair. CoRR, abs/1711.07163, 2017.
[11] Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. Deepfix: Fixing common C language errors by deep learning. In AAAI, 2017.
[12] Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. Synthesizing benchmarks for predictive modeling. In CGO, pages 86--99, 2017.
[13] Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with graphs. CoRR, abs/1711.00740, 2017.
[14] Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. Learning deep generative models of graphs. arXiv:1803.03324, 2018.
[15] Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. Deep code search. In ICSE, 2018.
[16] Daniel DeFreez, Aditya V. Thakur, and Cindy Rubio-González. Path-based function embedding and its application to specification mining. CoRR, abs/1802.07779, 2018.
[17] M. Brockschmidt, M. Allamanis, A. L. Gaunt, and O. Polozov. Generative Code Modeling with Graphs. ArXiv e-prints, 2018.
[18] Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. code2vec: Learning distributed representations of code. CoRR, arXiv:1803.09473, 2018.

Template for Term Paper

Please use this LaTeX template for writing your term paper. The page limit is five pages (strict).

If you're not yet familiar with LaTeX, you may want to try the Overleaf online editor (click on "SIG Proceedings Paper" to start with the required template).

Grading

Grading is based on the term paper, the talk, and active participation during each presentation.