In Section 5 we present and discuss findings per research question. Section 4 introduces our research questions and the methodology of this systematic review. The remaining sections of the paper are organized as follows: Section 3 provides a general overview of deep learning on code. The systematic and concise overview of deep learning approaches to vulnerability analysis on program code will also be helpful for beginners in this research area, as they can use our analysis as a guide in this complex and diverse field and in the tremendously growing list of machine learning literature. Our results are relevant for researchers in security and software engineering, supporting them in finding new research directions and in conducting their ongoing research. Finally, we provide rich discussions on code embeddings, network topologies, available datasets, and future trends in this area. Further, we compare and contrast proposed methods considering a typical deep learning pipeline, ranging from data gathering, pre-processing, learning to evaluation. We briefly discuss the evolution from shallow networks to deep learning for this application. The aim of the proposed methods is detecting known types of vulnerabilities in unseen program code rather than discovering new types of vulnerabilities. We identify a set of 32 relevant primary studies proposing deep vulnerability analysis of source and object code. This systematic literature review explicitly focuses on vulnerability analysis using deep learning approaches by detecting patterns in source code or object code. These types typically follow the CWE (Common Weakness Enumeration) categorization system by The MITRE Corporation ( 2020). The analysis is typically realized as binary classification, distinguishing vulnerable from non-vulnerable program code, or as a multi-class classification, additionally distinguishing the type of a vulnerability. One particular application of deep learning on program code is vulnerability analysis. Models can be trained for different purposes, such as approximate type inference, code completion, and bug localization (Allamanis et al. Thereby, program code can be either compiled object code or source code as written in a programming language. In software engineering, e.g., deep learning is used for analysis and prediction based on software development artifacts, such as commits, issues, documentation, description, and, of course, program code. Machine learning and especially deep learning (DL) methods gained importance in many research and application domains. Furthermore, addressing security, by preventing and fixing vulnerabilities, early in a development process saves high costs analogous to failure prevention and fixing in general, which associated costs substantially rise in later development stages (Kumar and Yadav 2017). As distributed and web-based applications are omnipresent in many areas today, making them more secure gains paramount relevance. To illustrate the problem, the number of recorded vulnerabilities in the NIST National Vulnerability Database grew from 14,500 records in 2017 to 17,300 records in 2019 (National Vulnerability Database 2020). Known vulnerabilities defined as “weakness in an information system, system security procedures, internal controls, or implementation that could be exploited by a threat source” (National Institute of Standards and Technology 2020) are constantly increasing. With the continuous digitalization of our society, an increasing number of software and IT systems is used every day. This SLR provides an overview and starting point for researchers interested in deep vulnerability analysis on program code. We also provide an overview of publicly available datasets in order to foster a stronger benchmarking of approaches. By compiling commonalities and differences in the approaches, we identify the current state of research in this area and discuss future directions. We discuss these techniques and alternatives in detail. We found a rich variety of proposed analysis approaches, code embeddings and network topologies. This systematic literature review (SLR) is aiming for a thorough analysis and comparison of 32 primary studies on DL-based vulnerability analysis of program code. Deep learning (DL) and its representation learning approach are increasingly been proposed for program code analysis potentially providing a powerful means in making software systems less vulnerable. Due to the continuous digitalization of our society, distributed and web-based applications become omnipresent and making them more secure gains paramount relevance.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |